Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

244 S. Stijven et al.


Fig. 1 Theleft plotillustrates the correlation matrix for the available inputs for a biofuels analysis.
Many of the available process setpoints and measurements as well as the associated chemical
analysis results in the data are highly correlated (either positively or negatively) which implies that
we will have potential for variable substitutions in the model development. Note in this case that
the maximum correlation is less than 0.7 which implies a foundation capability ofR2<0:5for a
simplistic single-variable model


comes in the selection of available function operators, definition of metavariables
and normalization of data ranges. From a user perspective, they care mostly about
having a selection of high-quality models which properly balance the complexity-
accuracy trade-off since our goal is to have as simple of model as possible—but no
simpler. Such a collection of models is illustrated in Fig.2.


2.6 Model Exploration and Insight Development


To paraphrase Bill Worzel, “Symbolic regression is an optionizer as well as
optimizer.” This is illustrated in Figs.3 and4. As such many potential models
are hypothesized, refined (or rejected) and available at the end of the development
stage. Since each of the independent model searches follows its own path through
the search space, we have the raw material to collect considerable insight into
the modeling potential and alternate solutions. An analysis often requires multiple
rounds of model development with each iteration building upon the insights
gathered from the prior rounds to focus the input variable set, tune development
options or simply to evolve additional model forms.
Understanding the number of variables required to achieve a given level of
performance and the modeling potential of inputs is very important. For instance,
even though a particular input does not quite provide as direct of path to a quality
model as another, it may be a measurement that is more easily or robustly achieved
which, from an operational standpoint, would make models containing it rather than
it the more desirable.

Free download pdf