Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

6.4 INSTANCE-BASED LEARNING 235

Discussion

Support vector machines originated from research in statistical learning theory (Vapnik 1999), and a good starting point for exploration is a tutorial by Burges (1998). A general description, including generalization to the case in which the data is not linearly separable, has been published by Cortes and Vapnik (1995). We have introduced the standard version of support vector regression: Schölkopf et al. (1999) present a different version that has one parameter instead of two. Smola and Schölkopf (2004) provide an extensive tutorial on support vector regression. The (voted) kernel perceptron is due to Freund and Schapire (1999). Cris- tianini and Shawe-Taylor (2000) provide a nice introduction to support vector machines and other kernel-based methods, including the optimization theory underlying the support vector learning algorithms. We have barely skimmed the surface of these learning schemes, mainly because advanced mathematics lies just beneath. The idea of using kernels to solve nonlinear problems has been applied to many algorithms, for example, principal components analysis (described in Section 7.3). A kernel is essentially a similarity function with certain mathematical properties, and it is possible to define kernel functions over all sorts of structures—for example, sets, strings, trees, and probability distributions. Shawe-Taylor and Cristianini (2004) cover kernel-based learning in detail. There is extensive literature on neural networks, and Bishop (1995) provides an excellent introduction to both multilayer perceptrons and RBF networks. Interest in neural networks appears to have declined since the arrival of support vector machines, perhaps because the latter generally require fewer parameters to be tuned to achieve the same (or greater) accuracy. However, multilayer perceptrons have the advantage that they can learn to ignore irrelevant attributes, and RBF networks trained using k-means can be viewed as a quick-and-dirty method for finding a nonlinear classifier.

6.4 Instance-based learning

In Section 4.7 we saw how the nearest-neighbor rule can be used to implement a basic form of instance-based learning. There are several practical problems with this simple method. First, it tends to be slow for large training sets, because the entire set must be searched for each test instance—unless sophisticated data structures such as kD-trees or ball trees are used. Second, it performs badly with noisy data, because the class of a test instance is determined by its single nearest neighbor without any “averaging” to help to eliminate noise. Third, it performs badly when different attributes affect the outcome to different extents—in the

P088407-Ch006.qxd 5/3/05 5:42 PM Page 235

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Discussion

6.4 Instance-based learning

Get our desktop app

Company

Features

Documentation

Resources