to rule induction and continued with association rules, linear models, the
nearest-neighbor method of instance-based learning, and clustering. The
present chapter develops all these topics except association rules, which have
already been covered in adequate detail.
We begin with decision tree induction and work up to a full description of
the C4.5 system, a landmark decision tree program that is probably the machine
learning workhorse most widely used in practice to date. Next we describe deci-
sion rule induction. Despite the simplicity of the idea, inducing decision rules
that perform comparably with state-of-the-art decision trees turns out to be
quite difficult in practice. Most high-performance rule inducers find an initial
rule set and then refine it using a rather complex optimization stage that dis-
cards or adjusts individual rules to make them work better together. We describe
the ideas that underlie rule learning in the presence of noise, and then go on to
cover a scheme that operates by forming partial decision trees, an approach
that has been demonstrated to perform as well as other state-of-the-art rule
learners yet avoids their complex and ad hoc heuristics. Following this, we take
a brief look at how to generate rules with exceptions, which were described in
Section 3.5.
There has been resurgence of interest in linear models with the introduction
ofsupport vector machines, a blend of linear modeling and instance-based
learning. Support vector machines select a small number of critical boundary
instances called support vectorsfrom each class and build a linear discriminant
function that separates them as widely as possible. These systems transcend
the limitations of linear boundaries by making it practical to include extra
nonlinear terms in the function, making it possible to form quadratic, cubic,
and higher-order decision boundaries. The same techniques can be applied to
the perceptron described in Section 4.6 to implement complex decision bound-
aries. An older technique for extending the perceptron is to connect units
together into multilayer “neural networks.” All these ideas are described in
Section 6.3.
The next section of the chapter describes instance-based learners, develop-
ing the simple nearest-neighbor method introduced in Section 4.7 and showing
some more powerful alternatives that perform explicit generalization. Follow-
ing that, we extend linear regression for numeric prediction to a more sophis-
ticated procedure that comes up with the tree representation introduced in
Section 3.7 and go on to describe locally weighted regression, an instance-based
strategy for numeric prediction. Next we return to clustering and review some
methods that are more sophisticated than simple k-means, methods that
produce hierarchical clusters and probabilistic clusters. Finally, we look at
Bayesian networks, a potentially very powerful way of extending the Naïve Bayes
method to make it less “naïve” by dealing with datasets that have internal
dependencies.
188 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES