Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
To use locally weighted regression, you need to decide on a distance-based
weighting scheme for the training instances. A common choice is to weight the
instances according to the inverse of their Euclidean distance from the test
instance. Another possibility is to use the Euclidean distance in conjunction with
a Gaussian kernel function. However, there is no clear evidence that the choice
of weighting function is critical. More important is the selection of a “smooth-
ing parameter” that is used to scale the distance function—the distance is mul-
tiplied by the inverse of this parameter. If it is set to a small value, only instances
very close to the test instance will receive significant weight; if it is large, more
distant instances will also have a significant impact on the model. One way of
choosing the smoothing parameter is to set it to the distance of the kth-nearest
training instance so that its value becomes smaller as the volume of training
data increases. The best choice ofkdepends on the amount of noise in the data.
The more noise there is, the more neighbors should be included in the linear
model. Generally, an appropriate smoothing parameter is found using cross-
validation.
Like model trees, locally weighted linear regression is able to approximate
nonlinear functions. One of its main advantages is that it is ideally suited for
incremental learning: all training is done at prediction time, so new instances
can be added to the training data at any time. However, like other instance-
based methods, it is slow at deriving a prediction for a test instance. First,
the training instances must be scanned to compute their weights; then, a
weighted linear regression is performed on these instances. Also, like other
instance-based methods, locally weighted regression provides little information
about the global structure of the training dataset. Note that if the smoothing
parameter is based on the kth-nearest neighbor and the weighting function
gives zero weight to more distant instances, the kD-trees and ball trees described
in Section 4.7 can be used to speed up the process of finding the relevant
neighbors.
Locally weighted learning is not restricted to linear regression: it can be
applied with any learning technique that can handle weighted instances. In par-
ticular, you can use it for classification. Most algorithms can be easily adapted
to deal with weights. The trick is to realize that (integer) weights can be simu-
lated by creating several copies of the same instance. Whenever the learning
algorithm uses an instance when computing a model, just pretend that it is
accompanied by the appropriate number of identical shadow instances. This
also works if the weight is not an integer. For example, in the Naïve Bayes algo-
rithm described in Section 4.2, multiply the counts derived from an instance by
the instance’s weight, and—voilà—you have a version of Naïve Bayes that can
be used for locally weighted learning.
It turns out that locally weighted Naïve Bayes works extremely well in prac-
tice, outperforming both Naïve Bayes itself and the k-nearest-neighbor tech-

252 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Free download pdf