Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

a linear threshold unitas a binary test of whether a linear function is greater or less than zero and a linear machineas a set of linear functions, one for each class, whose value for an unknown example is compared and the largest chosen as its predicted class. In the distant past, perceptrons fell out of favor on publication of an influential book that showed they had fundamental limitations (Minsky and Papert 1969); however, more complex systems of linear functions have enjoyed a resurgence in recent years in the form of neural networks, described in Section 6.3. The Winnow algorithms were introduced by Nick Littlestone in his PhD thesis in 1989 (Littlestone 1988, 1989). Multiresponse linear classifiers have found a new application recently for an operation called stackingthat com- bines the output of other learning algorithms, described in Chapter 7 (see Wolpert 1992). Friedman (1996) describes the technique of pairwise classification, Fürnkranz (2002) further analyzes it, and Hastie and Tibshirani (1998) extend it to estimate probabilities using pairwise coupling. Fix and Hodges (1951) performed the first analysis of the nearest-neighbor method, and Johns (1961) pioneered its use in classification problems. Cover and Hart (1967) obtained the classic theoretical result that, for large enough datasets, its probability of error never exceeds twice the theoretical minimum; Devroye et al. (1996) showed that k-nearest neighbor is asymptotically optimal for large kand nwithk/nÆ0. Nearest-neighbor methods gained popularity in machine learning through the work of Aha (1992), who showed that instance- based learning can be combined with noisy exemplar pruning and attribute weighting and that the resulting methods perform well in comparison with other learning methods. We take this up again in Chapter 6. The kD-tree data structure was developed by Friedman et al. (1977). Our description closely follows an explanation given by Andrew Moore in his PhD thesis (Moore 1991), who, along with Omohundro (1987), pioneered its use in machine learning. Moore (2000) describes sophisticated ways of constructing ball trees that perform well even with thousands of attributes. We took our ball tree example from lecture notes by Alexander Gray of Carnegie-Mellon Uni- versity. The voting feature intervals method mentioned in the Discussion sub- section at the end of Section 4.7 is described by Demiroz and Guvenir (1997). The k-means algorithm is a classic technique, and many descriptions and variations are available (e.g., see Hartigan 1975). The clever use ofkD-trees to speed up k-means clustering, which we chose to illustrate using ball trees instead, was pioneered by Moore and Pelleg (2000) in their X-means clustering algorithm. That algorithm also contains some other innovations, described in Section 6.6.

142 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources