Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

6.2 CLASSIFICATION RULES 205

Using global optimization

In general, rules generated using incremental reduced-error pruning in this manner seem to perform quite well, particularly on large datasets. However, it has been found that a worthwhile performance advantage can be obtained by performing a global optimization step on the set of rules induced. The motiva- tion is to increase the accuracy of the rule set by revising or replacing individ- ual rules. Experiments show that both the size and the performance of rule sets are significantly improved by postinduction optimization. On the other hand, the process itself is rather complex. To give an idea of how elaborate—and heuristic—industrial-strength rule learners become, Figure 6.4 shows an algorithm called RIPPER, an acronym for repeated incremental pruning to produce error reduction.Classes are examined in increasing size and an initial set of rules for the class is generated using incremental reduced-error pruning. An extra stopping condition is introduced that depends on the description length of the examples and rule set. The description length DLis a complex formula that takes into account the number of bits needed to send a set of examples with respect to a set of rules, the number of bits required to send a rule with kconditions, and the number of bits needed to send the integer k—times an arbitrary factor of 50% to compensate for pos- sible redundancy in the attributes. Having produced a rule set for the class, each rule is reconsidered and two variants produced, again using reduced-error pruning—but at this stage, instances covered by other rules for the class are removed from the pruning set, and success rate on the remaining instances is used as the pruning criterion. If one of the two variants yields a better

Initialize E to the instance set Split E into Grow and Prune in the ratio 2:1 For each class C for which Grow and Prune both contain an instance Use the basic covering algorithm to create the best perfect rule for class C Calculate the worth w(R) for the rule on Prune, and of the rule with the final condition omitted w(R-) While w(R-) > w(R), remove the final condition from the rule and repeat the previous step From the rules generated, select the one with the largest w(R) Print the rule Remove the instances covered by the rule from E Continue

Figure 6.3Algorithm for forming rules by incremental reduced-error pruning.

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Using global optimization

Get our desktop app

Company

Features

Documentation

Resources