Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Using GP for Data Science 131


deviation of confidence bounds using all the predictions. There were approximately
240 lines of scripts created on top of the GP system to create the approach.


3.6.1 Visualization


While it is not common to visualize GBR solutions directly, the GP solutions
can be viewed to understand variable relationships and overall system structure.
Because we used an ensemble approach, it was a little less clear what or how the
final solutions was calculated, but the clients appreciated seeing very clearly how
attributes were used in the ensemble. To accomplish this, because the GP system
we used did not allow easy viewing of the solutions, we created a Matlab script
that converted the parenthesized in-fix notation from the GP system to a more easy
readable one for the user. The matlab code utilizes Matlab symbolic math toolbox
to perform the following actions:



  • Replace variable names in equation

  • Latex format the strings

  • Display the strings as graphs


Our particular GP system works only with the variable name formats such as X1,
X2,... XN. An example model generated from the training is:
...exp.....X 8 CX27/=.X32/^2 /sqrt.X30//X10///..X 1 CX4//^2 /.
This makes the equations hard to understand. The first operation performed
using Matlab symbolic toolbox is to replace the variables to their proper names
by specifying the correct symbol names for each variable. This is done as a batch
process for the entire output population at the same time. The resulting output looks
something like this:
..exp.....MotorLoadCTemperatureC/=.TemperatureH/^2 sqrt.TemperatureF//
MotorTemperature///C..MotorSpeedCFlowRate//^2 /.
In the next step, the built-in Matlab latex() command converts a symbolic
expression string into a latex formatted string as shown below:


e

pTemperatureF.MotorLoadCTemperatureC/
TemperatureH^2 MotorTemperatureC.FlowRateCMotorSpeed/^2
The equation can then be displayed by creating text using Matlab Latex
interpreter and then drawing the figure. The Matlab code to do this is very simple
and consists of 33 lines of code.


3.6.2 Diversity


Diversity measures the variation between the population members. In evolutionary
search, it is typically good to have the population be as diverse as possible while still
improving accuracy. Diversity can be analyzed by several methods, for example:



  • Variation between the existing terms in programs across the population. If many
    different programs have recurring terms, then there is less diversity.

Free download pdf