Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1
122 S. Gustafson et al.

Ta b l e 2 Data Science tool attributes and intersection with GP


Data Science attribute GP existing capability
1 Handle big data Good (distributed compute and sampling)
2 Extract knowledge Good (inspect solutions)
3 Encode domain knowledge Good (code as functions and built-ins)
4 Easy to use weak (lots of parameters, fewer commercial tools)
5 Iterate quickly Weak (high compute time)
6 Integrate with big data infrastructure Good (initial work demonstrates HDFS integration)
7 Good out of box performance Weak (typically a lot of customization required)
8 Quick prototyping Weak (can application developers do it?)

with data management and data featurization tools. That is, given one or more
data files to build the model out of, the data often needs to be integrated, shaped,
cleaned, and any derived values created. This data set is then used for modeling.
Secondly, the problem specific evaluation method needs to be encoded to create end
to end modeling capability that can then demonstrate incremental improvement in
performance. Sometimes that might mean using an evaluation data set. Other times
it might mean extrapolating on new data coming from a customer. Because this
custom way of evaluating a solution changes, the ease at which it can be captured is
important. We now look at a real-world Data Science activity to inform us on how
GP met expectations as a Data Science tool.

3 Case Study: Operations Optimization


Our application area from industry is operations optimization, which is the improve-
ment of one or more processes given a specific business objective. There are
many different types of data used in an operations optimization problem. First,
there are low-level sensors like temperature, pressure or vibration. Sensors like these
are often measured in multiple places on a machine or in a plant. Secondly, there
are sensors that provide states of inputs, like chemical mixture or composition, and
states of outputs, like the results of a visual inspection system or a non-destructive
testing method. Lastly, there are derived or back-calculated sensors that are often
included in the operations optimization task. These values could be from a physics-
based model or equation, or from some other equation or simulation, and then
assigned back to the operation to assign a probable value. Figure 1 highlights these
different input types and how they flow in the operations optimization problem.
In industrial operations, daily decisions are made regarding various control
settings in order to maintain a targeted flow of product or output. Multiple sensors
from the plant or field are used to understand the current state and predict the future.
Given the various industrial systems encountered, the often extreme environments,
and the failure or drift of sensors, the resulting real-time data is often very noisy
Free download pdf