Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Evolving Simple Symbolic Regression Models 15


4.2.2 Noisy Data


The same algorithm settings as in the previous experiments have been used for
evaluating the performance of the algorithm variants on the five noisy problems.
Again 50 repetitions have been performed and the most accurate models with the
best performance on the training partition, have been extracted and analyzed. The
aggregated information regarding training and test accuracy as well as the model
lengths are shown in Table 6.
Contrary to the previously tested artificial problems, GP with a length limit of
20 performs worse compared to the other single-objective algorithms. The reason
might be that the smaller length limitation, which gave an advantage on the artificial
problems, restricts the search space too much to be able to evolve accurate prediction
models.
Due to the noise on the data the training performance can differ significantly
the test performance, which is especially apparent on the Housing and Chemical
problem. The Breiman, Friedman and Tower problems contain enough data that
the effect of the noise is reduced and the difference between the training and
test evaluation is minimal. With the exception of the Friedman problem multi-
objective symbolic regression with the new complexity measure performs best on all
problems. Especially on the Housing and Chemical problems the difference between
training and test accuracy is smaller, which might by the preference of less complex
functions during model building.
The single-objective algorithms always hit the predefined length limit as it was
the case with the results obtained on the artificial problems. The selection pressure
towards small models is highest when using the visitation length or tree size as
complexity measure for NSGA-II. Hence, these two algorithm variants produced
the smallest models, whereas NSGA-II with variable count exhibits no parsimony
pressure at all. NSGA-II with the new complexity measure produces models of
similar or slightly larger size compared to NSGA-II executions which the size for
complexity calculation.
The analysis of the functions in the evolved models, displayed in Table 7 ,shows
a similar picture as the results on the artificial problems. The simplest models, using
the fewest trigonometric, exponential and power symbols, have been generated
by NSGA-II complexity and GP Length 20 with the difference that the models
generated by NSGA-II are more accurate. The largest values in this analysis that
indicate more complex models, have been obtained by the other single-objective
GP variants and NSGA-II Variables.


5 Conclusion


In this chapter we have investigated the effects of using different complexity
measures for multi-objective genetic programming to solve symbolic regression
problems and compared the results to standard genetic programming. Multi-

Free download pdf