Using GP for Data Science 123
Process Input Types
Temperature
Pressure
Vibration
Chemical Composition
Percentage mix
Output of
upstream process
Physics based model
Simulation
Back-calculation
Inferred or
Calculated Inputs
Inputs from
Low Level
Sensors
“State” of
Input
Derived Inputs
Machine/
Manufacturing
Plant/Process
“product”
Fig. 1 Different input types to the operations optimization problem
and requires significant data processing and cleaning prior to use. Therefore, the
problem at a high-level is one of using real-time sensors and control capability
to understand and improve operations. Automated methods that clean up data or
estimate non-measurable attributes help in providing an accurate, real-time view of
the whole system. In Kordon and Smits ( 2001 ), the authors use GP to create soft
sensors, or virtual sensors, that augment more expense sensors.
3.1 The Data Science Challenge at Hand
Our Data Science challenge could be stated as follows: Given historical data of an
operation, is it possible to use data and analytic methods to create accurate sensor
estimators, for those time instants when the sensor is offline? A sensor often changes
from being online (available) or offline (unavailable). A secondary challenge was to
identify instances (time periods) when the sensor of interest is drifting away from its
ideal accuracy level or from its prior relationship with other system sensors. Solving
these challenges would give an operations managers a consistent, real-time stream
of data that characterizes their operation, enabling accurate and timely optimization
decisions.
We defined our problem as having sensorss 1 :::sN 1 as available, and sensorsN
as partially available: meaning that for portions of time sensorsNis online, but it
frequently goes offline due to other activities or faults. Our goal was to determine
whether we could build an accurate model using data for sensorss 1 :::sN 1 and