Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Evolving Simple Symbolic Regression Models 5


the Pearson’s correlation coefficientR^2 ), but in addition to the population a separate
archive containing the Pareto front of the best identified models (complexity vs.
accuracy) is maintained. New individuals are created by breeding members of
the Pareto front with the most accurate models of the population and after each
generation the Pareto front is updated. Instead of developing a new algorithm for
multi-objective genetic programming, we have used a well-studied multi-objective
optimization algorithm that has been adapted to the specific needs when performing
symbolic regression.
The nondominated sorting genetic algorithm (NSGA) was proposed by Srinivas
and Deb ( 1994 ) for solving multi-objective optimization problems. However,
its runtime complexity for nondominated sorting is rather high, no elitism is
included in the original NSGA formulation and additionally, a sharing parameter
for maintaining diversity has to be specified. These points of criticism have been
addressed and an improved, faster version called NSGA-II (Deb et al. 2002 ) has
been presented. The major extensions to standard genetic algorithms of NSGA-II
are the use of a nondomination rank and crowding distance for guiding the selection
towards a uniformly spread Pareto-optimal front. Furthermore, elitism is ensured by
combining the parent population and the generated offspring and selecting the best
individuals of this set until the new population is filled. The published version of the
NSGA-II has been reimplemented inC#based on the published source code^1 and
has been integrated in HeuristicLab (Wagner 2009 ).


3.1 Domination of Solutions with Equal Qualities


To use NSGA-II efficiently for solving multi-objective symbolic regression it has to
be adapted to the specifics of multi-objective symbolic regression problems. In the
original version of the algorithm solutions with exactly equal objective values are
treated as nondominated. This poses a problem when solving symbolic regression
problems, because a single-node individual (either a constant value or a variable)
will always have a constant quality and complexity value. Furthermore, individuals
with only one node are the simplest individuals that can be built and are always
included in the Pareto front. Within a few generations of the algorithm, those one-
node individuals account for a huge portion of the Pareto front and the algorithm
is not able to evolve larger or more complex individuals with a better fit to the
presented data. Hence, the domination criterion of NSGA-II has been modified
in order to treat solutions with equal objective values as dominated by the first
individual with those objective values. This has the effect that only the first one-node
solution is included in the Pareto front and results in a better algorithm performance.
The effects of the algorithm adaptations with respect to the domination criterion
are displayed in Fig. 1 , where the minimum, average and maximum symbolic


(^1) http://www.iitk.ac.in/kangal/codes.shtml

Free download pdf