Following an extensive description of model trees, we briefly explain how to
generate rules from model trees, and then describe another approach to numeric
prediction—locally weighted linear regression. Whereas model trees derive
from the basic divide-and-conquer decision tree methodology, locally weighted
regression is inspired by the instance-based methods for classification that we
described in the previous section. Like instance-based learning, it performs all
“learning” at prediction time. Although locally weighted regression resembles
model trees in that it uses linear regression to fit models locally to particular
areas of instance space, it does so in quite a different way.
Model trees
When a model tree is used to predict the value for a test instance, the tree is fol-
lowed down to a leaf in the normal way, using the instance’s attribute values to
make routing decisions at each node. The leaf will contain a linear model based
on some of the attribute values, and this is evaluated for the test instance to
yield a raw predicted value.
Instead of using this raw value directly, however, it turns out to be beneficial
to use a smoothing process to compensate for the sharp discontinuities that will
inevitably occur between adjacent linear models at the leaves of the pruned tree.
This is a particular problem for models constructed from a small number of
training instances. Smoothing can be accomplished by producing linear models
for each internal node, as well as for the leaves, at the time the tree is built. Then,
once the leaf model has been used to obtain the raw predicted value for a test
instance, that value is filtered along the path back to the root, smoothing it at
each node by combining it with the value predicted by the linear model for that
node.
An appropriate smoothing calculation is
where p¢is the prediction passed up to the next higher node,pis the prediction
passed to this node from below,qis the value predicted by the model at this
node,nis the number of training instances that reach the node below, and kis
a smoothing constant. Experiments show that smoothing substantially increases
the accuracy of predictions.
Exactly the same smoothing process can be accomplished by incorporating
the interior models into each leaf model after the tree has been built. Then,
during the classification process, only the leaf models are used. The disadvan-
tage is that the leaf models tend to be larger and more difficult to comprehend,
because many coefficients that were previously zero become nonzero when the
interior nodes’ models are incorporated.
p
np kq
nk
¢=
+
+
,
244 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES