Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

226 S. Silva et al.


Preliminary experiments have revealed that pruning the best individual of each
generation shifts the distribution of the number of dimensions to lower values (or
prevents it from shifting to higher values so easily) during the evolution, without
harming fitness.


4.5 Elitism


It was mentioned earlier that, in order to explore solutions of different dimensions,
M3GP relies on mutation to add and remove dimensions from the individuals, with a
fairly high probability. It also has to rely on selection to keep the best dimensions in
the population and discard the worst ones. The way to do this is by ensuring some
elitism on the survival of the individuals from one generation to the next. M3GP
does not allow the best individual of any generation to be lost, and always copies it
to the next generation. Let us recall that this individual is already optimized in the
sense that it went through pruning. Preliminary experiments have shown that elitism
is indeed able to improve fitness.


5 eM3GP: M3GP Ensemble Classifier


M3GP assumes that a single transformation will simplify the classification problem
for all the classes. However, this may not be the case. It may happen that the
optimal data transformation is in fact class dependent, i.e., different data clusters
require transformations that change the geometrical distribution of the data points
in specialized ways.
Another problem with M3GP seems to be the automatically chosen number
of dimensions. In most problems, the number of dimensions used by M3GP is
much larger than what M2GP uses, even when the performance on the test set is
statistically equivalent (see Table 3 in Sect.7.2). For instance, a notorious example
is the WAV dataset, where the median test accuracy is almost the same for M2GP
and M3GP (84.9 and 84.3, respectively) but the median number of dimensions used
in the population is quite different (5 and 31, respectively). This suggests that M3GP
may be suffering from bloat at the dimension level.
Finally, like many other classifiers, M3GP appears to suffer from overfitting, and
is negatively affected by class imbalance, two issues that need to be addressed in
real-world scenarios.
To address these issues, we propose an ensemble method called ensemble M3GP,
or simply eM3GP, whereby classification is done usingMdifferent transformations,
one for each class in an multiclass problem withMclasses. The proposed eM3GP
uses basically the same methods and representation scheme as M3GP, with the
following enhancements.

Free download pdf