Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

6.5 NUMERIC PREDICTION 253


nique. It also compares favorably with far more sophisticated ways of enhanc-
ing Naïve Bayes by relaxing its intrinsic independence assumption. Locally
weighted learning only assumes independence within a neighborhood, not
globally in the whole instance space as standard Naïve Bayes does.
In principle, locally weighted learning can also be applied to decision trees
and other models that are more complex than linear regression and Naïve Bayes.
However, it is beneficial here because it is primarily a way of allowing simple
models to become more flexible by allowing them to approximate arbitrary
targets. If the underlying learning algorithm can already do that, there is little
point in applying locally weighted learning. Nevertheless it may improve other
simple models—for example, linear support vector machines and logistic
regression.


Discussion


Regression trees were introduced in the CART system of Breiman et al. (1984).
CART, for “classification and regression trees,”incorporated a decision tree
inducer for discrete classes much like that of C4.5, which was developed inde-
pendently, and a scheme for inducing regression trees. Many of the techniques
described in the preceding section, such as the method of handling nominal
attributes and the surrogate device for dealing with missing values, were
included in CART. However, model trees did not appear until much more
recently, being first described by Quinlan (1992). Using model trees for gener-
ating rule sets (although not partial trees) has been explored by Hall et al.
(1999).
Model tree induction is not so commonly used as decision tree induction,
partly because comprehensive descriptions (and implementations) of the tech-
nique have become available only recently (Wang and Witten 1997). Neural nets
are more commonly used for predicting numeric quantities, although they
suffer from the disadvantage that the structures they produce are opaque and
cannot be used to help us understand the nature of the solution. Although there
are techniques for producing understandable insights from the structure of
neural networks, the arbitrary nature of the internal representation means that
there may be dramatic variations between networks of identical architecture
trained on the same data. By dividing the function being induced into linear
patches, model trees provide a representation that is reproducible and at least
somewhat comprehensible.
There are many variations of locally weighted learning. For example, statis-
ticians have considered using locally quadratic models instead of linear ones and
have applied locally weighted logistic regression to classification problems. Also,
many different potential weighting and distance functions can be found in the
literature. Atkeson et al. (1997) have written an excellent survey on locally

Free download pdf