Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
reached a lower bound of 10%, whichever occurs first. (These default values can
be changed.) There are four alternative metrics for ranking rules:Confidence,
which is the proportion of the examples covered by the premise that are also
covered by the consequent (called accuracyin Section 4.5);Lift,which is deter-
mined by dividing the confidence by the support (called coveragein Section 4.5);
Leverage,which is the proportion of additional examples covered by both the
premise and the consequent beyond those expected if the premise and conse-
quent were statistically independent; and Conviction,a measure defined by Brin
et al. (1997). You can also specify a significance level, and rules will be tested for
significance at this level.
PredictiveAprioricombines confidence and support into a single measure of
predictive accuracy(Scheffer 2001) and finds the best nassociation rules in order.
Internally, the algorithm successively increases the support threshold, because
the value of predictive accuracy depends on it.Tertiusfinds rules according to
a confirmation measure (Flach and Lachiche 1999), seeking rules with multiple
conditions in the consequent, like Apriori, but differing in that these conditions
are OR’d together, not ANDed. It can be set to find rules that predict a single
condition or a predetermined attribute (i.e., classification rules). One parame-
ter determines whether negation is allowed in the antecedent, the consequent,
or both; others give the number of rules sought, minimum degree of confir-
mation, minimum coverage, maximum proportion of counterinstances, and
maximum rule size. Missing values can match any value, never match, or be sig-
nificant and possibly appear in rules.

10.8 Attribute selection


Figure 10.21 shows that part of Weka’s attribute selection panel where you
specify the attribute evaluator and search method; Table 10.9 and Table 10.10
list the choices. Attribute selection is normally done by searching the space of
attribute subsets, evaluating each one (Section 7.1). This is achieved by com-
bining one of the four attribute subset evaluators in Table 10.9 with one of the
seven search methods in Table 10.10. A potentially faster but less accurate
approach is to evaluate the attributes individually and sort them, discarding

420 CHAPTER 10 | THE EXPLORER


Figure 10.21Attribute selection: specifying an evaluator and a search method.
Free download pdf