Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
The idea of incremental reduced-error pruning is due to Fürnkranz
and Widmer (1994) and forms the basis for fast and effective rule induction.
The RIPPER rule learner is due to Cohen (1995), although the published
description appears to differ from the implementation in precisely how
the description length (DL) affects the stopping condition. What we have pre-
sented here is the basic idea of the algorithm; there are many more details in
the implementation.
The whole question of measuring the value of a rule has not yet been satis-
factorily resolved. Many different measures have been proposed, some blatantly
heuristic and others based on information-theoretical or probabilistic grounds.
However, there seems to be no consensus on what the best measure to use is.
An extensive theoretical study of various criteria has been performed by
Fürnkranz and Flach (2005).
The rule-learning method based on partial decision trees was developed by
Frank and Witten (1998). It produces rule sets that are as accurate as those gen-
erated by C4.5 and more accurate than other fast rule-induction methods.
However, its main advantage over other schemes is not performance but sim-
plicity: by combining the top-down decision tree induction method with sepa-
rate-and-conquer rule learning, it produces good rule sets without any need for
global optimization.
The procedure for generating rules with exceptions was developed as an
option in the Induct system by Gaines and Compton (1995), who called them
ripple-down rules. In an experiment with a large medical dataset (22,000
instances, 32 attributes, and 60 classes), they found that people can understand
large systems of rules with exceptions more readily than equivalent systems of
regular rules because that is the way that they think about the complex medical
diagnoses that are involved. Richards and Compton (1998) describe their role
as an alternative to classic knowledge engineering.

6.3 Extending linear models


Section 4.6 described how simple linear models can be used for classification in
situations where all attributes are numeric. Their biggest disadvantage is that
they can only represent linear boundaries between classes, which makes them
too simple for many practical applications. Support vector machines use linear
models to implement nonlinear class boundaries. (Although it is a widely
used term,support vector machinesis something of a misnomer: these are
algorithms, not machines.) How can this be possible? The trick is easy: trans-
form the input using a nonlinear mapping; in other words, transform the
instance space into a new space. With a nonlinear mapping, a straight line in
the new space doesn’t look straight in the original instance space. A linear model

214 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Free download pdf