Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

method that builds a full decision tree each time. The gain decreases as more pruning takes place. For datasets with numeric attributes, the asymptotic time complexity of the algorithm is the same as building the full decision tree, because in this case the complexity is dominated by the time required to sort the attribute values in the first place. Once a partial tree has been built, a single rule is extracted from it. Each leaf corresponds to a possible rule, and we seek the “best” leaf of those subtrees (typically a small minority) that have been expanded into leaves. Experiments show that it is best to aim at the most general rule by choosing the leaf that covers the greatest number of instances. When a dataset contains missing values, they can be dealt with exactly as they are when building decision trees. If an instance cannot be assigned to any given branch because of a missing attribute value, it is assigned to each of the branches with a weight proportional to the number of training instances going down that branch, normalized by the total number of training instances with known values at the node. During testing, the same procedure is applied separately to each rule, thus associating a weight with the application of each rule to the test instance. That weight is deducted from the instance’s total weight before it is passed to the next rule in the list. Once the weight has reduced to zero, the pre- dicted class probabilities are combined into a final classification according to the weights. This yields a simple but surprisingly effective method for learning decision lists for noisy data. Its main advantage over other comprehensive rule- generation schemes is simplicity, because other methods require a complex global optimization stage to achieve the same level of performance.

Rules with exceptions

In Section 3.5 we learned that a natural extension of rules is to allow them to have exceptions, and exceptions to the exceptions, and so on—indeed the whole rule set can be considered as exceptions to a default classification rule that is used when no other rules apply. The method of generating a “good” rule, using one of the measures described in the previous section, provides exactly the mechanism needed to generate rules with exceptions. First, a default class is selected for the top-level rule: it is natural to use the class that occurs most frequently in the training data. Then, a rule is found per- taining to any class other than the default one. Of all such rules it is natural to seek the one with the most discriminatory power, for example, the one with the best evaluation on a test set. Suppose this rule has the form

if <condition> then class = <new class>

210 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Rules with exceptions

Get our desktop app

Company

Features

Documentation

Resources