Genetic_Programming_Theory_and_Practice_XIII

82 B. Hodjat and H. Shahrzad

Training Data Segment 2 Training Data

Segment 1 Training Data

Segment 2 Evolution Engines

Segment 1 Evolution Engines

Candidate B originating on segment 2 Candidate A validated on unseen segment 2

Candidate A originating on segment 1 Candidate B validated on segment 1

Evolution Coordinators Age-layered cadidate list

Fig. 3 Distributed crossTesladog-validation

Evolution Engines ensure that candidates being sent down for further validation
are only sent to Evolution Engines with a segment id equal to the current segment of
the candidate. A candidate is said to have completed its validation once it has aged
sufficiently on data samples from all available segments.
Evolution Engines validating candidates on segments other than the candidate’s
originating segment are barred from bearing offspring. This way, new generations
are not contaminated by data from other segments, which we aim to keep as unseen
for them for cross-validation purposes (see Fig. 3 ).
Rather than training onk 1 and validating on one segment, in nPool, we train on
one and validate onk 1. This is in order to maintain complete exclusivity between
the training and validation sets. Also, this ensures a more reliable assessment for
the generalization of the candidates by using larger unseen validation sets. It is
important, however, to ensure that the size of each segment is large enough to avoid
over-fitting.
The segments should be mutually exclusive. In the experiments for this paper,
the segments were divided up randomly. However, depending on the application,
the division of the data into segments might require stratification.

Genetic_Programming_Theory_and_Practice_XIII

Get our desktop app

Company

Features

Documentation

Resources