Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Prime-Time: Symbolic Regression Takes Its Place in the Real World 243


2.2 Corporate Goal


The generic goal of any symbolic regression exercise is to determine an input-
output relationship and, furthermore, to determine which of the inputs are most
effective and useful in predicting the targeted output. However, there can be nuances
which motivate different models being studied than those with the “best accuracy”.
To illustrate, if new customer requirements are imposing better cold-weather
performance for a biofuels additive, we might want to simultaneously understand
the chemistry trade-offs involved in achieving the targeted performance while also
identifying the operational process control settings and feedstock characteristics
required to satisfy the customer.
The net conclusion is that insight and operational performance are both important
from the practitioner viewpoint.


2.3 The Importance of Methodology and Workflow


The total-cost-of-ownership of a model is very important since both efficacy and
efficiency make or break the ROI of modeling. In a typical industrial analysis, the
actual model development tends to take a relatively small fraction of the human
time expended. As a result the infrastructure around exploring the available data to
design the analysis approach as well as the tools to select models and extract insight
are critical ingredients to success.


2.4 Data Exploration and Analysis Design


A conventional assumption in many modeling techniques is to presume that
the inputs and independent. This convention is generally violated in most real-
world problems as illustrated in Fig. 1 for a biofuel data set. Although variable
orthogonality can be achieved, for example, by a principle components analysis,
doing such eliminates the interpretability of models beyond the first one or two
principle components. Simply pruning the input set to selected sets of independent
inputs is not an attractive alternative since such imposes a priori constraints which
are not necessary and are often counter-productive.


2.5 Model Development


The basics of the evolutionary search for symbolic regression models are pretty
straight-forward: reward models for accuracy, simplicity and novelty [Smits and
Vladislavleva; Vladislavleva et al.] and let the primordial soup percolate. The art

Free download pdf