Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

The idea of incremental reduced-error pruning is due to Fürnkranz and Widmer (1994) and forms the basis for fast and effective rule induction. The RIPPER rule learner is due to Cohen (1995), although the published description appears to differ from the implementation in precisely how the description length (DL) affects the stopping condition. What we have pre- sented here is the basic idea of the algorithm; there are many more details in the implementation. The whole question of measuring the value of a rule has not yet been satis- factorily resolved. Many different measures have been proposed, some blatantly heuristic and others based on information-theoretical or probabilistic grounds. However, there seems to be no consensus on what the best measure to use is. An extensive theoretical study of various criteria has been performed by Fürnkranz and Flach (2005). The rule-learning method based on partial decision trees was developed by Frank and Witten (1998). It produces rule sets that are as accurate as those gen- erated by C4.5 and more accurate than other fast rule-induction methods. However, its main advantage over other schemes is not performance but sim- plicity: by combining the top-down decision tree induction method with sepa- rate-and-conquer rule learning, it produces good rule sets without any need for global optimization. The procedure for generating rules with exceptions was developed as an option in the Induct system by Gaines and Compton (1995), who called them ripple-down rules. In an experiment with a large medical dataset (22,000 instances, 32 attributes, and 60 classes), they found that people can understand large systems of rules with exceptions more readily than equivalent systems of regular rules because that is the way that they think about the complex medical diagnoses that are involved. Richards and Compton (1998) describe their role as an alternative to classic knowledge engineering.

6.3 Extending linear models

Section 4.6 described how simple linear models can be used for classification in situations where all attributes are numeric. Their biggest disadvantage is that they can only represent linear boundaries between classes, which makes them too simple for many practical applications. Support vector machines use linear models to implement nonlinear class boundaries. (Although it is a widely used term,support vector machinesis something of a misnomer: these are algorithms, not machines.) How can this be possible? The trick is easy: transform the input using a nonlinear mapping; in other words, transform the instance space into a new space. With a nonlinear mapping, a straight line in the new space doesn’t look straight in the original instance space. A linear model

214 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

6.3 Extending linear models

Get our desktop app

Company

Features

Documentation

Resources