6.5 NUMERIC PREDICTION 249
by split, and pruning it from the leaves upward, performed by prune. The node
data structure contains a type flag indicating whether it is an internal node or
a leaf, pointers to the left and right child, the set of instances that reach that
node, the attribute that is used for splitting at that node, and a structure repre-
senting the linear model for the node.
The sdfunction called at the beginning of the main program and again
at the beginning ofsplitcalculates the standard deviation of the class values
of a set of instances. Then follows the procedure for obtaining synthetic
binary attributes that was described previously. Standard procedures for creat-
ing new nodes and printing the final tree are not shown. In split,sizeofreturns
the number of elements in a set. Missing attribute values are dealt with as
described earlier. The SDR is calculated according to the equation at the begin-
ning of the previous subsection. Although not shown in the code, it is set to
infinity if splitting on the attribute would create a leaf with fewer than two
instances. In prune, the linearRegression routine recursively descends the
subtree collecting attributes, performs a linear regression on the instances at that
node as a function of those attributes, and then greedily drops terms if doing
so improves the error estimate, as described earlier. Finally, the errorfunction
returns
where nis the number of instances at the node and nis the number of param-
eters in the node’s linear model.
Figure 6.16 gives an example of a model tree formed by this algorithm for a
problem with two numeric and two nominal attributes. What is to be predicted
is the rise time of a simulated servo system involving a servo amplifier, motor,
lead screw, and sliding carriage. The nominal attributes play important roles.
Four synthetic binary attributes have been created for each of the five-valued
nominal attributes motorand screw,and they are shown in Table 6.1 in terms
of the two sets of values to which they correspond. The ordering of these
values—D, E, C, B, A for motorand coincidentally D, E, C, B, A for screwalso—
is determined from the training data: the rise time averaged over all examples
for which motor=D is less than that averaged over examples for which motor
=E, which is less than when motor=C, and so on. It is apparent from the mag-
nitude of the coefficients in Table 6.1 that motor=D versus E, C, B, A plays a
leading role in the LM2 model, and motor=D, E versus C, B, A plays a leading
role in LM1. Both motorand screwalso play minor roles in several of the models.
The decision tree shows a three-way split on a numeric attribute. First a binary-
splitting tree was generated in the usual way. It turned out that the root and one
of its descendants tested the same attribute,pgain,and a simple algorithm was
n
nn
+
¥
n Â
n
instancesdeviation from predicted class value
,