254 S. Stijven et al.
Ta b l e 1 Symbolic regression settings
Name Va l u e
Population size 1000
Archive size 100
Crossover rate 0.9
Mutation rate 0.1
Population tournaments 5
Primitive functions C,,,;^1 ,^2 ,x,p,log,exp
Time budget FluTE RUN 1,2,3 1000 s
Time budget FluTE RUN 4 7200 s
Time budget FluTE RUN 5 3600 s
Independent evolutions FluTE 8ensembles also provide dimensionality trade-offs in complexity and accuracy of
models. Another strong benefit of effective symbolic regression implementation is
functionality to automatically generate hypotheses for meta-variables,—low order
transformations of driver inputs, which can potentially linearize the final models and
enable further application of the powerful linear and regularized linear learning.
The ultimate highlight of SR-enabled system understanding is interactive sen-
sitivity analysis of generated ensembles. Interactive exploration as well as math-
ematical optimization of SR ensembles allows to identify “edge-cases”, which
might have been over-looked or un-anticipated by the domain experts. In addition,
interactive prediction explorers are the only way to present the solutions and what-if
scenario exploration to business decision makers (without overburdening them with
mathematical models). Figure 9 illustrates a snapshot of a six-variable prediction
explorer for the clinical attack rate. This and other explorer are publicly available at
http://www.idm.uantwerpen.be.
4.3 Results
4.3.1 Transmission
As stated above we performed a stepwise exploration of the US-tailored simulation
model for pandemic influenza (FluTE), applied to Seattle and Los Angeles county
(Chao et al. 2010 ). We first simulated epidemics in the Seattle population using four
basic model parameters: R 0 , whether individuals can travel, the number of infected
individuals introduced into the population and whether this seeding occurs only once
(static) or on a daily basis (dynamic). Table 2 summarizes the parameter ranges.
The surrogate models for the cumulative clinical attack rate(AR) were of good
quality (error<0.001). The cumulative clinical attack rate is the fraction of the
population that got infected. Although each configuration was executed 20 times,
almost no stochastic fadeout was observed. The dichotomous variable indicating
whether people can travel was absent in most surrogate models. Given the inherent
