Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

to rule induction and continued with association rules, linear models, the nearest-neighbor method of instance-based learning, and clustering. The present chapter develops all these topics except association rules, which have already been covered in adequate detail. We begin with decision tree induction and work up to a full description of the C4.5 system, a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date. Next we describe decision rule induction. Despite the simplicity of the idea, inducing decision rules that perform comparably with state-of-the-art decision trees turns out to be quite difficult in practice. Most high-performance rule inducers find an initial rule set and then refine it using a rather complex optimization stage that dis- cards or adjusts individual rules to make them work better together. We describe the ideas that underlie rule learning in the presence of noise, and then go on to cover a scheme that operates by forming partial decision trees, an approach that has been demonstrated to perform as well as other state-of-the-art rule learners yet avoids their complex and ad hoc heuristics. Following this, we take a brief look at how to generate rules with exceptions, which were described in Section 3.5. There has been resurgence of interest in linear models with the introduction ofsupport vector machines, a blend of linear modeling and instance-based learning. Support vector machines select a small number of critical boundary instances called support vectorsfrom each class and build a linear discriminant function that separates them as widely as possible. These systems transcend the limitations of linear boundaries by making it practical to include extra nonlinear terms in the function, making it possible to form quadratic, cubic, and higher-order decision boundaries. The same techniques can be applied to the perceptron described in Section 4.6 to implement complex decision boundaries. An older technique for extending the perceptron is to connect units together into multilayer “neural networks.” All these ideas are described in Section 6.3. The next section of the chapter describes instance-based learners, develop- ing the simple nearest-neighbor method introduced in Section 4.7 and showing some more powerful alternatives that perform explicit generalization. Follow- ing that, we extend linear regression for numeric prediction to a more sophisticated procedure that comes up with the tree representation introduced in Section 3.7 and go on to describe locally weighted regression, an instance-based strategy for numeric prediction. Next we return to clustering and review some methods that are more sophisticated than simple k-means, methods that produce hierarchical clusters and probabilistic clusters. Finally, we look at Bayesian networks, a potentially very powerful way of extending the Naïve Bayes method to make it less “naïve” by dealing with datasets that have internal dependencies.

188 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources