Genetic_Programming_Theory_and_Practice_XIII

122 S. Gustafson et al.

Ta b l e 2 Data Science tool attributes and intersection with GP

Data Science attribute GP existing capability 1 Handle big data Good (distributed compute and sampling) 2 Extract knowledge Good (inspect solutions) 3 Encode domain knowledge Good (code as functions and built-ins) 4 Easy to use weak (lots of parameters, fewer commercial tools) 5 Iterate quickly Weak (high compute time) 6 Integrate with big data infrastructure Good (initial work demonstrates HDFS integration) 7 Good out of box performance Weak (typically a lot of customization required) 8 Quick prototyping Weak (can application developers do it?)

with data management and data featurization tools. That is, given one or more data files to build the model out of, the data often needs to be integrated, shaped, cleaned, and any derived values created. This data set is then used for modeling. Secondly, the problem specific evaluation method needs to be encoded to create end to end modeling capability that can then demonstrate incremental improvement in performance. Sometimes that might mean using an evaluation data set. Other times it might mean extrapolating on new data coming from a customer. Because this custom way of evaluating a solution changes, the ease at which it can be captured is important. We now look at a real-world Data Science activity to inform us on how GP met expectations as a Data Science tool.

3 Case Study: Operations Optimization

Our application area from industry is operations optimization, which is the improvement of one or more processes given a specific business objective. There are many different types of data used in an operations optimization problem. First, there are low-level sensors like temperature, pressure or vibration. Sensors like these are often measured in multiple places on a machine or in a plant. Secondly, there are sensors that provide states of inputs, like chemical mixture or composition, and states of outputs, like the results of a visual inspection system or a non-destructive testing method. Lastly, there are derived or back-calculated sensors that are often included in the operations optimization task. These values could be from a physics- based model or equation, or from some other equation or simulation, and then assigned back to the operation to assign a probable value. Figure 1 highlights these different input types and how they flow in the operations optimization problem. In industrial operations, daily decisions are made regarding various control settings in order to maintain a targeted flow of product or output. Multiple sensors from the plant or field are used to understand the current state and predict the future. Given the various industrial systems encountered, the often extreme environments, and the failure or drift of sensors, the resulting real-time data is often very noisy

Genetic_Programming_Theory_and_Practice_XIII

Get our desktop app

Company

Features

Documentation

Resources