Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Prime-Time: Symbolic Regression Takes Its Place in the Real World 253


4.2.2 Step 2: Simulation Model


The FluTE simulator allows varying 38 individual inputs (indicators defining
the influenza in question, the number of seed cases (infected people entering
the population on day 1), and prevention measures, like closing schools and
kindergartens, enforced quarantines, vaccinations (with vaccination fractions and
efficacy), antiviral medication and their influence on infection probabilities, etc.
We will add information on how the input-space is converted to simulation
configurations. We will add some info on how we executed the simulation on a
cluster, and will add info on the avg. calculation time of each simulation run—2 h.
The summary is that with 37 knots to turn and 2 h needed to evaluate one input-
response combination (simulating a couple of million people changing states over
180 days and nights), quick data collection is virtually impossible. Meta-modeling
of such computationally expensive simulators and development of interactive tools
to efficiently explore what-of scenarios is the only solution to prepare for crisis
situations. When meta-models are created (with symbolic regression in this case),
they can be evaluated in real-time immediately as estimations for the transmission
rate R 0 of the attacking infection are becoming available.


4.2.3 Step 3: Meta-Modeling with Symbolic Regression


After the simulation runs are completed, we gather the results and create an
input-response data set. Symbolic Regression (SR) applied to this data set will
generate a robust ensemble of meta-models emulating the behavior of the simulator.
We used the SR implementation from the DataModeler package in Mathematica
(Evolved Analytics LLC 2011 ). The result of a single SR experiment is an ensemble
of tree-based regression models that give a good approximation of the response
variable together with the confidence metric for the prediction.
In Willem et al. ( 2014 ) we used fixed time budgets for SR experiments based on
the size and dimensionality of the data sets. Timings are listed in Table 1.


4.2.4 Step 4: System Understanding


To arrive at a convincing ensemble of symbolic regression models we selected
ensembles at the “knee” of the Pareto Front of non-dominated trade-offs in model
complexity and model error spaces. At the last step the non-linear optimization of
constants in all ensemble models was performed. A model ensemble of high-quality
and minimal complexity obtained through an effective SR algorithm can facilitate
system understanding and focus the research.
The main differentiating factors of an effective SR implementations are the
facts that the final ensembles only contain variable drivers, ensembles are con-
structed from maximally different strong SR learners, and therefore provide reliable
confidence intervals for their prediction. Besides variable importance, good final

Free download pdf