Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
extreme case, when some attributes are completely irrelevant—because all
attributes contribute equally to the distance formula. Fourth, it does not
perform explicit generalization, although we intimated in Section 3.8 (and illus-
trated in Figure 3.8) that some instance-based learning systems do indeed
perform explicit generalization.

Reducing the number of exemplars


The plain nearest-neighbor rule stores a lot of redundant exemplars: it is almost
always completely unnecessary to save all the examples seen so far. A simple
variant is to classify each example with respect to the examples already seen and
to save only ones that are misclassified. We use the term exemplarsto refer to
the already-seen instances that are used for classification. Discarding correctly
classified instances reduces the number of exemplars and proves to be an effec-
tive way to prune the exemplar database. Ideally, only a single exemplar is stored
for each important region of the instance space. However, early in the learning
process examples may be discarded that later turn out to be important, possi-
bly leading to some decrease in predictive accuracy. As the number of stored
instances increases, the accuracy of the model improves, and so the system
makes fewer mistakes.
Unfortunately, the strategy of only storing misclassified instances does not
work well in the face of noise. Noisy examples are very likely to be misclassified,
and so the set of stored exemplars tends to accumulate those that are least useful.
This effect is easily observed experimentally. Thus this strategy is only a step-
ping-stone on the way toward more effective instance-based learners.

Pruning noisy exemplars


Noisy exemplars inevitably lower the performance of any nearest-neighbor
scheme that does not suppress them because they have the effect of repeatedly
misclassifying new instances. There are two ways of dealing with this. One is to
locate, instead of the single nearest neighbor, the knearest neighbors for some
predetermined constant k and assign the majority class to the unknown
instance. The only problem here is determining a suitable value ofk. Plain
nearest-neighbor learning corresponds to k =1. The more noise, the greater the
optimal value ofk. One way to proceed is to perform cross-validation tests with
different values and choose the best. Although this is expensive in computation
time, it often yields excellent predictive performance.
A second solution is to monitor the performance of each exemplar that is
stored and discard ones that do not perform well. This can be done by keeping
a record of the number of correct and incorrect classification decisions that each
exemplar makes. Two predetermined thresholds are set on the success ratio.
When an exemplar’s performance drops below the lower one, it is deleted from

236 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Free download pdf