Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

4.4 COVERING ALGORITHMS: CONSTRUCTING RULES 105

greatly reduced. In practical implementations, we can use an ad hoc test to guard against splitting on such a useless attribute. Unfortunately, in some situations the gain ratio modification overcompen- sates and can lead to preferring an attribute just because its intrinsic information is much lower than that for the other attributes. A standard fix is to choose the attribute that maximizes the gain ratio, provided that the information gain for that attribute is at least as great as the average information gain for all the attributes examined.

Discussion

The divide-and-conquer approach to decision tree induction, sometimes called top-down induction of decision trees,was developed and refined over many years by J. Ross Quinlan of the University of Sydney, Australia. Although others have worked on similar methods, Quinlan’s research has always been at the very fore- front of decision tree induction. The method that has been described using the information gain criterion is essentially the same as one known as ID3. The use of the gain ratio was one of many improvements that were made to ID3 over several years; Quinlan described it as robust under a wide variety of circum- stances. Although a robust and practical solution, it sacrifices some of the ele- gance and clean theoretical motivation of the information gain criterion. A series of improvements to ID3 culminated in a practical and influential system for decision tree induction called C4.5. These improvements include methods for dealing with numeric attributes, missing values, noisy data, and generating rules from trees, and they are described in Section 6.1.

4.4 Covering algorithms: Constructing rules

As we have seen, decision tree algorithms are based on a divide-and-conquer approach to the classification problem. They work from the top down, seeking at each stage an attribute to split on that best separates the classes; then recur- sively processing the subproblems that result from the split. This strategy generates a decision tree, which can if necessary be converted into a set of classification rules—although if it is to produce effective rules, the conversion is not trivial. An alternative approach is to take each class in turn and seek a way of covering all instances in it, at the same time excluding instances not in the class. This is called a coveringapproach because at each stage you identify a rule that “covers” some of the instances. By its very nature, this covering approach leads to a set of rules rather than to a decision tree. The covering method can readily be visualized in a two-dimensional space of instances as shown in Figure 4.6(a). We first make a rule covering the a’s. For

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Discussion

4.4 Covering algorithms: Constructing rules

Get our desktop app

Company

Features

Documentation

Resources