Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Using GP for Data Science 123


Process Input Types
Temperature
Pressure
Vibration

Chemical Composition

Percentage mix
Output of
upstream process
Physics based model
Simulation
Back-calculation
Inferred or
Calculated Inputs

Inputs from
Low Level
Sensors

“State” of
Input

Derived Inputs

Machine/
Manufacturing
Plant/Process

“product”

Fig. 1 Different input types to the operations optimization problem


and requires significant data processing and cleaning prior to use. Therefore, the
problem at a high-level is one of using real-time sensors and control capability
to understand and improve operations. Automated methods that clean up data or
estimate non-measurable attributes help in providing an accurate, real-time view of
the whole system. In Kordon and Smits ( 2001 ), the authors use GP to create soft
sensors, or virtual sensors, that augment more expense sensors.


3.1 The Data Science Challenge at Hand


Our Data Science challenge could be stated as follows: Given historical data of an
operation, is it possible to use data and analytic methods to create accurate sensor
estimators, for those time instants when the sensor is offline? A sensor often changes
from being online (available) or offline (unavailable). A secondary challenge was to
identify instances (time periods) when the sensor of interest is drifting away from its
ideal accuracy level or from its prior relationship with other system sensors. Solving
these challenges would give an operations managers a consistent, real-time stream
of data that characterizes their operation, enabling accurate and timely optimization
decisions.
We defined our problem as having sensorss 1 :::sN 1 as available, and sensorsN
as partially available: meaning that for portions of time sensorsNis online, but it
frequently goes offline due to other activities or faults. Our goal was to determine
whether we could build an accurate model using data for sensorss 1 :::sN 1 and

Free download pdf