Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Behavioral Program Synthesis:Insights and Prospects 175


Nevertheless there exists a wider class of behavioral patterns of potentially
greater interest, namely the patterns that are detectable by conventional knowledge
discovery and machine learning (ML) algorithms. Such patterns are perused by the
method described in the following, termed PANGEA (PAtterN Guided Evolutionary
Algorithms), originally proposed in Krawiec and Swan ( 2013 ) and then extended in
Krawiec and O’Reilly ( 2014 ). Technically, behavioral patterns are being revealed
there by a ML algorithm trained on execution traces. Information on the resulting
classifier is then used to augment the fitness function. By relying on generic ML
tools, this process does not rely on domain knowledge (as is common for humans).
Rather, it seeks abstract regularities that can be used to predict the correct output of a
program. If this approach is able to reveal meaningful dependencies between partial
outcomes and the desired output, we may hope to thereby promote programs with
the potential to produce good results in future, even if at the moment of evaluation
the output they produce is incorrect.
The ML perspective on behavioral program synthesis originates in the observa-
tion that an execution trace bears some similarity to anexamplein ML. Assuming
the execution record resulting from applyingpto all tests is aligned, i.e., the states in
particular traces correspond to each other, the columns of the record can be likened
toattributesin ML. The desired program outputycorresponds in this context to the
desired response of a classifier (or regressor, depending on the nature of the task).
And crucially, a ML induction algorithm (inducerfor short), given a set of such
examples, can be used to produce a classifier that predicts the desired output of the
program based on the attributes describing execution traces.
The method proceeds in the following steps, exemplified in Fig. 1 :



  1. An execution record is built by running the program on the tests.

  2. The execution record is transformed into a conventional ML datasetD.

  3. A ML induction algorithm is applied toD, resulting in a classifierC.

  4. Program evaluation is calculated from the properties ofC.


The record built in Step 1, as explained in the example in Sect. 2 (Fig. 1 ), is
subsequently transformed in Step 2, resulting in the training set labeled as ‘ML
dataset’ in the figure. In this case of simple tree-based GP, the attributes correspond
one-to-one to the columns of the execution record, so the only essential change
is the addition of the program outputyas a dependent variable (class label) in
the dataset. Transformation of an execution record into a ML dataset could be
more sophisticated, for instance if states represent compound rather than elementary
data types. More advanced transformation could facilitate discovery of behavioral
patterns, for instance when representation biases of a ML classifier prevent it from
capturing certain classes of pattern. Yet another motivation is to allow discovery
of higher-order patterns that are unobservable when each attribute reflects a single
execution state. Though these options deserve future research, here we focus on
tree-based GP and the straightforward, one-to-one transformation of the columns of
execution record into ML attributes.
Given the training setD, in Step 3 we train a ML classifierCon it. In Krawiec and
Swan ( 2013 ) and Krawiec and O’Reilly ( 2014 ), we used the decision tree inducers

Free download pdf