6.5 NUMERIC PREDICTION 251
in Section 3.3, and sometimes the structure can be expressed much more con-
cisely using a set of rules instead of a tree. Can we generate rules for numeric
prediction? Recall the rule learner described in Section 6.2 that uses separate-
and-conquer in conjunction with partial decision trees to extract decision rules
from trees. The same strategy can be applied to model trees to generate deci-
sion lists for numeric prediction.
First build a partial model tree from all the data. Pick one of the leaves
and make it into a rule. Remove the data covered by that leaf; then repeat
the process with the remaining data. The question is, how to build the partial
model tree, that is, a tree with unexpanded nodes? This boils down to the
question of how to pick which node to expand next. The algorithm of
Figure 6.5 (Section 6.2) picks the node whose entropy for the class attribute is
smallest. For model trees, whose predictions are numeric, simply use the vari-
ance instead. This is based on the same rationale: the lower the variance, the
shallower the subtree and the shorter the rule. The rest of the algorithm stays
the same, with the model tree learner’s split selection method and pruning
strategy replacing the decision tree learner’s. Because the model tree’s leaves are
linear models, the corresponding rules will have linear models on the right-hand
side.
There is one caveat when using model trees in this fashion to generate rule
sets: the smoothing process that the model tree learner employs. It turns out
that using smoothed model trees does not reduce the error in the final rule set’s
predictions. This may be because smoothing works best for contiguous data, but
the separate-and-conquer scheme removes data covered by previous rules,
leaving holes in the distribution. Smoothing, if it is done at all, must be per-
formed after the rule set has been generated.
Locally weighted linear regression
An alternative approach to numeric prediction is the method of locally weighted
linear regression. With model trees, the tree structure divides the instance space
into regions, and a linear model is found for each of them. In effect, the train-
ing data determines how the instance space is partitioned. Locally weighted
regression, on the other hand, generates local models at prediction time by
giving higher weight to instances in the neighborhood of the particular test
instance. More specifically, it weights the training instances according to
their distance to the test instance and performs a linear regression on the
weighted data. Training instances close to the test instance receive a high
weight; those far away receive a low one. In other words, a linear model is tailor
made for the particular test instance at hand and used to predict the instance’s
class value.