Fig. 2 Although additional attributes may be used as criteria in the model development, from a
practical standpoint, the candidate model set comes from those models which best balance the
complexity-accuracy trade-off. In this figure, the models denoted asred dotsare those lying on
the Pareto front. Although these are nominally optimal, the other models indicated inbluein
the density plot may be of more practical interest due to their model dimensionality or particular
combination of constituent models (Color figure online)

2.7 Model Selection and Ensemble Definition

The deterministic nature of the physical sciences introduces a bias towards THE
model. However, the stochastic nature of evolutionary search will uncover many
“good enough” models which we can judiciously combine to create a trustable
model that will warn if it is exposed to new operating conditions or if the modeled
system has undergone some sort of fundamental change so that the models are no
longer applicable. This can be easily accomplished by selecting models from near
the knee of the Pareto front which have uncorrelated error residuals. Because the
selected ensemble models agree with the development data, they will agree when
near known operating conditions and, because they are diverse, they will diverge
when exposed to new operating regions. This is illustrated in Fig.6 for the model
set shown in Fig.5.

2.8 Process Optimization Summary

The computational and algorithmic advances over the past two decades allows sym-
bolic regression to quickly build insightful models from process data. These can then
be deployed as inferential sensors to control and optimize the targeted processes as
well as to provide guidance for operational opportunities and enhancements.

