Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

boundaries rather than the hard-edged cutoff implied by the k-nearest-neighbor rule, in which any particular example is either “in” or “out” of the decision. With such a measure, given a test instance whose class is unknown, its distance to the set of all training instances in each class in turn is calculated, and the closest class is chosen. It turns out that nominal and numeric attributes can be treated in a uniform manner within this transformation-based approach by defining different transformation sets, and it is even possible to take account of unusual attribute types—such as degrees of arc or days of the week, which are measured on a circular scale.

Discussion

Nearest-neighbor methods gained popularity in machine learning through the work of Aha (1992), who showed that, when combined with noisy exemplar pruning and attribute weighting, instance-based learning performs well in com- parison with other methods. It is worth noting that although we have described it solely in the context of classification rather than numeric prediction prob- lems, it applies to these equally well: predictions can be obtained by combining the predicted values of the knearest neighbors and weighting them by distance. Viewed in instance space, the standard rule- and tree-based representations are only capable of representing class boundaries that are parallel to the axes defined by the attributes. This is not a handicap for nominal attributes, but it is for numeric ones. Non-axis-parallel class boundaries can only be approxi- mated by covering the region above or below the boundary with several axis-parallel rectangles, the number of rectangles determining the degree of approximation. In contrast, the instance-based method can easily represent arbitrary linear boundaries. Even with just one example of each of two classes, the boundary implied by the nearest-neighbor rule is a straight line of arbitrary orientation, namely the perpendicular bisector of the line joining the examples. Plain instance-based learning does not produce explicit knowledge representations except by selecting representative exemplars. However, when combined with exemplar generalization, a set of rules can be obtained that may be compared with those produced by other machine learning schemes. The rules tend to be more conservative because the distance metric, modified to incor- porate generalized exemplars, can be used to process examples that do not fall within the rules. This reduces the pressure to produce rules that cover the whole example space or even all of the training examples. On the other hand, the incre- mental nature of most instance-based learning methods means that rules are formed eagerly, after only part of the training set has been seen; and this inevitably reduces their quality. We have not given precise algorithms for variants of instance-based learning that involve generalization because it is not clear what the best way to do gen-

242 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Discussion

Get our desktop app

Company

Features

Documentation

Resources