Genetic_Programming_Theory_and_Practice_XIII

Using GP for Data Science 129

New Input Dara with possible “drift”

Exisiting Model

Predict using Updated Predictors

Compare The results of the two models

Build and Try out Predict Using updated model Existing Model

Significantly different

Within set threshold

“Publish” updated model

“Retain” The existing model

Compare Feature sensitivity Do Coefficients diff significantly?

Fig. 5 Schematic showing criteria for handling sensor “Drift”

like selection, evaluation, and recombination. The GP system also contains several
configuration files, or parameters file, where things like population size, functions,
initialization method and selection pressure can be specified. The goal of the
configuration files are to allow the customization of the system without modifying
the class files, which would require a recompile of the source files. Like most
systems for EC, there is a decent learning curve to understand how certain
functionality is represented and programmed.
If we look across all existing GP solutions, each provides strengths in various
attributes: user interfaces, cloud and distributed compute support, integration with
data management and visualization solutions like Mathematica or Matlab or R,
or advanced GP features like ensembles like FlexGP, etc. We chose to use a
package that was more mature on the advanced features, but less mature in the user
experience aspects. This choice is suitable for users with a high degree of expertise,
but as we will see later, has its downside for both novice users and integration with
other systems and prototyping. The process used to create a competitive GP solution
for our Data Science task was as follows:

Feature selection as in GBR,

Simplification of mathematical operators,

Increased the training data size. Initial results showed that GP benefited with
more data.

Genetic_Programming_Theory_and_Practice_XIII

Get our desktop app

Company

Features

Documentation

Resources