Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

4.4 COVERING ALGORITHMS: CONSTRUCTING RULES 107


Again, one ais erroneously covered by these rules. If it were necessary to exclude
it, more tests would have to be added to the second rule, and additional rules
would need to be introduced to cover the b’s that these new tests exclude.


Rules versus trees


A top-down divide-and-conquer algorithm operates on the same data in a
manner that is, at least superficially, quite similar to a covering algorithm. It
might first split the dataset using the xattribute and would probably end up
splitting it at the same place,x=1.2. However, whereas the covering algorithm
is concerned only with covering a single class, the division would take both
classes into account, because divide-and-conquer algorithms create a single
concept description that applies to all classes. The second split might also be at
the same place,y=2.6, leading to the decision tree in Figure 4.6(b). This tree
corresponds exactly to the set of rules, and in this case there is no difference in
effect between the covering and the divide-and-conquer algorithms.
But in many situations there isa difference between rules and trees in terms
of the perspicuity of the representation. For example, when we described the
replicated subtree problem in Section 3.3, we noted that rules can be symmet-
ric whereas trees must select one attribute to split on first, and this can lead to
trees that are much larger than an equivalent set of rules. Another difference is
that, in the multiclass case, a decision tree split takes all classes into account,
trying to maximize the purity of the split, whereas the rule-generating method
concentrates on one class at a time, disregarding what happens to the other
classes.


A simple covering algorithm


Covering algorithms operate by adding tests to the rule that is under construc-
tion, always striving to create a rule with maximum accuracy. In contrast, divide-
and-conquer algorithms operate by adding tests to the tree that is under
construction, always striving to maximize the separation among the classes.
Each of these involves finding an attribute to split on. But the criterion for the
best attribute is different in each case. Whereas divide-and-conquer algorithms
such as ID3 choose an attribute to maximize the information gain, the cover-
ing algorithm we will describe chooses an attribute–value pair to maximize the
probability of the desired classification.
Figure 4.7 gives a picture of the situation, showing the space containing all
the instances, a partially constructed rule, and the same rule after a new term
has been added. The new term restricts the coverage of the rule: the idea is to
include as many instances of the desired class as possible and exclude as many
instances of other classes as possible. Suppose the new rule will cover a total of
tinstances, of which pare positive examples of the class and t-pare in other

Free download pdf