Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

6.2 CLASSIFICATION RULES 207


description length, it replaces the rule. Next we reactivate the original building
phase to mop up any newly uncovered instances of the class. A final check is
made to ensure that each rule contributes to the reduction of description length,
before proceeding to generate rules for the next class.

Obtaining rules from partial decision trees


There is an alternative approach to rule induction that avoids global optimiza-
tion but nevertheless produces accurate, compact, rule sets. The method com-
bines the divide-and-conquer strategy for decision tree learning with the
separate-and-conquer one for rule learning. It adopts the separate-and-conquer
strategy in that it builds a rule, removes the instances it covers, and continues
creating rules recursively for the remaining instances until none are left.
However, it differs from the standard approach in the way that each rule is
created. In essence, to make a single rule, a pruned decision tree is built for the
current set of instances, the leaf with the largest coverage is made into a rule,
and the tree is discarded.
The prospect of repeatedly building decision trees only to discard most of
them is not as bizarre as it first seems. Using a pruned tree to obtain a rule
instead of building it incrementally by adding conjunctions one at a time avoids
a tendency to overprune that is a characteristic problem of the basic separate-
and-conquer rule learner. Using the separate-and-conquer methodology in con-
junction with decision trees adds flexibility and speed. It is indeed wasteful to
build a full decision tree just to obtain a single rule, but the process can be accel-
erated significantly without sacrificing the preceding advantages.
The key idea is to build a partial decision tree instead of a fully explored one.
A partial decision tree is an ordinary decision tree that contains branches to

(b)

DL: see text
G = p[log(p/t) − log(P/T)]
W = p +^1
t + 2
A =

p + n′
T ; accuracy for this rule
p = number of positive examples covered by this rule (true positives)
n = number of negative examples covered by this rule (false negatives)
t = p + n; total number of examples covered by this rule
n′ = N – n; number of negative examples not covered by this rule (true negatives)
P = number of positive examples of this class
N = number of negative examples of this class
T = P + N; total number of examples of this class

Figure 6.4(continued)

Free download pdf