Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
be found that covers this instance, in which case the class in question is pre-
dicted, or no such rule is found, in which case the class is not predicted.
Now return to the overall algorithm. Each class is considered in turn, and
rules are generated that distinguish instances in that class from the others. No
ordering is implied between the rules for one class and those for another. Con-
sequently, the rules that are produced can be executed independent of order.
As described in Section 3.3, order-independent rules seem to provide more
modularity by each acting as independent nuggets of “knowledge,” but they
suffer from the disadvantage that it is not clear what to do when conflicting
rules apply. With rules generated in this way, a test example may receive multi-
ple classifications, that is, rules that apply to different classes may accept it. Other
test examples may receive no classification at all. A simple strategy to force a
decision in these ambiguous cases is to choose, from the classifications that are
predicted, the one with the most training examples or, if no classification is pre-
dicted, to choose the category with the most training examples overall. These
difficulties do not occur with decision lists because they are meant to be inter-
preted in order and execution stops as soon as one rule applies: the addition of
a default rule at the end ensures that any test instance receives a classification.
It is possible to generate good decision lists for the multiclass case using a slightly
different method, as we shall see in Section 6.2.
Methods such as PRISM can be described as separate-and-conqueralgo-
rithms: you identify a rule that covers many instances in the class (and excludes
ones not in the class), separate out the covered instances because they are already
taken care of by the rule, and continue the process on those that are left. This
contrasts nicely with the divide-and-conquer approach of decision trees. The
separate step greatly increases the efficiency of the method because the instance
set continually shrinks as the operation proceeds.

4.5 Mining association rules


Association rules are like classification rules. You could find them in the same
way, by executing a divide-and-conquer rule-induction procedure for each pos-
sible expression that could occur on the right-hand side of the rule. But not only
might any attribute occur on the right-hand side with any possible value; a
single association rule often predicts the value of more than one attribute. To
find such rules, you would have to execute the rule-induction procedure once
for every possible combinationof attributes, with every possible combination of
values, on the right-hand side. That would result in an enormous number
of association rules, which would then have to be pruned down on the basis of
their coverage(the number of instances that they predict correctly) and their

112 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Free download pdf