Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

128 S. Gustafson et al.


The ways we improved the GBR results were through feature selection, by
increasing the training data, and by re-training on a sliding time-window. Once we
started to obtain acceptable results, the default parameters used for GBR were not
changed. The Python GBR approach was very quick to build, and we used common
and open-source libraries to create the initial Python script. In total, around 400
lines of un-optimized Python code were written to implement the complete GBR
solution.


3.5 Adapting the Model to Handle Sensor Drift


One effect of implementing a predictive model that relies on time series data as its
input is that the model’s accuracy will tend to degrade over time as sensor values
drift. This is because the underlying relationships between the sensorss 1 :::sNthat
were modeled have themselves changed over time. If we do not account for this,
a model that was accurate at a previous point in time will start to make very poor
predictions. The typical way to fix this is to ‘rebuild’ the model by re-estimating the
predictors. This raises two questions:



  1. How do we know when it is time to refresh the existing model?

  2. How frequently should we ‘publish’ a new model to the clients such that it causes
    minimal disruption?
    To prevent this error from growing large over time, we ran two models in parallel.
    One model was the existing (published) model, and the other was a new one being
    evaluated. In addition to the comparing the prediction errors, we also compared the
    actual predictor values and the variable ranks of the two models. If the updated
    model was ‘significantly’ different (above the set threshold), it was accepted as time
    to switch to the new model and publish it to the clients (Fig. 5 ).


3.6 Genetic Programming Solution


Genetic Programming is an evolutionary algorithm which provides a heuristic
search alternative to other modeling approaches such as regression (Koza 1992 ).
A recent special issue provides an up-to-date account of progress and open issues in
the field (O’Neill et al. 2010 ).
The GP system we used for symbolic regression contains several state-of-the-
art features like multi-objective optimization for selecting for solution accuracy
and complexity the maintenance of a pareto front. The underlying GP system is
the same as described in Veeramachaneni et al. ( 2015 ) where additional features
for cloud-based implementation, multiple-regression learning, and factoring—each
learner or individual leverage a different set of parameters and data when learning.
The Java-based system contained several classes implementing specific features

Free download pdf