Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

80 B. Hodjat and H. Shahrzad


Fig. 1 The EC-Star hub and spoke distribution architecture


the evolutionary cycle. At each new generation, an Evolution Engine submits its
fittest candidates to the server for consideration. This is typically after a set number
of evaluations (i.e., the maturity age). Age is defined as the number of samples a
candidate is evaluated upon.
The server side, or Evolution Coordinator, maintains a list of the best of the best
candidates so far. EC-Star achieves scale through making copies of candidates at
the server, sending them to Evolution Engines for aging, and merging aged results
submitted after aging simultaneously on the Evolution Engines (see Fig.2). This
also allows the spreading of the fitter genetic material.
EC-Star is massively distributable by running each Evolution Engine on a
processing node (e.g., CPU) with limited bandwidth and occasional availability
(see Hodjat et al. 2014 ). Typical runs utilize hundreds of thousands of processing
units spanning across thousands of geographically dispersed sites.
In the Evolution Coordinator, only candidates of the same age-range are com-
pared with one another. This technique is called age-layering, and it was first
introduced by Hornby ( 2006 )—note, however, that the definition of age here is
quite different. In EC-Star, each age-range has a fixed quota, and a ‘shadow’ of a
candidate that has aged out of an age-layer is retained as a place-holder for filtering
incoming candidates. To balance the load, a farm of Evolution Coordinators are
used, all of which are synchronized over a single age-layered pool of candidates.
Typically, candidates harvested from the top age-layer of an EC-Star run are
validated on an unseen set, post harvest, in order to ensure generalization. What if
some validation could take place at scale in a distributed manner?
The nPool approach described in this paper is inspired by the well-known k-fold
cross validation technique (see Refaeilzadeh et al. 2009 ), in whichkiterations of
training and validation are performed onkequally sized segments (or folds) of data,

Free download pdf