Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

10 M. Kommenda et al.


Ta b l e 2 Description of artificial and real-world problems and the training and test ranges


Name Function

Training
points

Test
points
F 1 F 1 .x 1 ;:::;x 5 /Dx 1 Cx 2 C.x 3 2/^2 Cx 4 x 5 100 400
F 2 F 2 .x 1 ;:::;x 5 /D 10 sin.x 1 /Cx 2 Cx 3 x 4 Cx 4 x 5 100 400
F 3 F 3 .x 1 ;:::;x 5 /Dexp.0:7x 1 /C.0:5x 2 C2/^2 Cx 32 Cx 4 x 5 100 400
F 4 F 4 .x 1 ;:::;x 4 /Dlog..x 1 Cx 2 /^2 / 100 400
F 5 F 5 .x 1 ;:::;x 4 /D.x 1 Cx 2 /.x 1 Cx 2 /.x 3 Cx 4 / 100 400
Breiman F 6 .x 1 ;:::;x 10 /D

(
3 C 3 x 2 C 2 x 3 Cx 4 ifx 1 D 1
 3 C 3 x 5 C 2 x 6 Cx 7 otherwise

5000 5000

Friedman F 7 .x 1 ;:::;x 10 /D0:1e^4 x^1 C4=Œ1Ce^20 x^2 C^10 C 3 x 3 C 2 x 4 Cx 5 C 5000 5000
Housing F 8 .x 1 ;:::;x 13 /D‹ 337 169
Chemical F 9 .x 1 ;:::;x 57 /D‹ 711 355
Tow e r F 10 .x 1 ;:::;x 25 /D‹ 3136 1863

F 1 – F 5 are newly defined problems to demonstrate the effects of different complexity measures,
whereas the lower problems have previously been published and used as benchmark problems


4.2 Results


We performed 50 repetitions of each algorithm variant, single-objective with vary-
ing maximum tree length and multi-objective with varying complexity measures,
on each defined benchmark problem. The results show that multi-objective genetic
programming does not worsen the prediction accuracies, while generating simpler
models.
Table 3 shows the aggregated information in terms of the median and interquar-
tile range of the prediction accuracy on the training and test partition of the
best models obtained in an algorithm run. In the case of multi-objective genetic
programming using the NSGA-II the best model is automatically the most complex
one (the last model in the Pareto front). Furthermore, the length of the best models is
shown to give an indication of their complexity and next to the problem the minimal
expression tree length to solve the problem optimally is given.
When comparing the training errors almost all algorithm variants perform
equally well, with the exception of standard GP with a length of 100 and the NSGA-
II with variable complexity. Among the standard GP algorithms the one with the
smallest length constraint performs best, both on the training and test partition. The
reason for this is that especially when complex function symbols are allowed to
be included in the models, the limitation of the search space helps the algorithm
to generate more accurate prediction models, as long as the length constraint is
sufficiently high to model the data.
The differences among the NSGA-II algorithms with the new complexity
measure, the tree size and the visitation length can be neglected both in terms of
the median as well as the interquartile range on the last problems. Only on the first
problem the new complexity measure performs better, especially when comparing
the interquartile ranges.

Free download pdf