Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

82 B. Hodjat and H. Shahrzad


Training
Data Segment 2
Training
Data

Segment 1
Training
Data

Segment 2
Evolution
Engines

Segment 1
Evolution
Engines

Candidate B originating
on segment 2
Candidate A validated
on unseen segment 2

Candidate A originating
on segment 1
Candidate B validated on
segment 1

Evolution Coordinators
Age-layered
cadidate
list

Fig. 3 Distributed crossTesladog-validation


Evolution Engines ensure that candidates being sent down for further validation
are only sent to Evolution Engines with a segment id equal to the current segment of
the candidate. A candidate is said to have completed its validation once it has aged
sufficiently on data samples from all available segments.
Evolution Engines validating candidates on segments other than the candidate’s
originating segment are barred from bearing offspring. This way, new generations
are not contaminated by data from other segments, which we aim to keep as unseen
for them for cross-validation purposes (see Fig. 3 ).
Rather than training onk 1 and validating on one segment, in nPool, we train on
one and validate onk 1. This is in order to maintain complete exclusivity between
the training and validation sets. Also, this ensures a more reliable assessment for
the generalization of the candidates by using larger unseen validation sets. It is
important, however, to ensure that the size of each segment is large enough to avoid
over-fitting.
The segments should be mutually exclusive. In the experiments for this paper,
the segments were divided up randomly. However, depending on the application,
the division of the data into segments might require stratification.

Free download pdf