Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

reached a lower bound of 10%, whichever occurs first. (These default values can be changed.) There are four alternative metrics for ranking rules:Confidence, which is the proportion of the examples covered by the premise that are also covered by the consequent (called accuracyin Section 4.5);Lift,which is deter- mined by dividing the confidence by the support (called coveragein Section 4.5); Leverage,which is the proportion of additional examples covered by both the premise and the consequent beyond those expected if the premise and consequent were statistically independent; and Conviction,a measure defined by Brin et al. (1997). You can also specify a significance level, and rules will be tested for significance at this level. PredictiveAprioricombines confidence and support into a single measure of predictive accuracy(Scheffer 2001) and finds the best nassociation rules in order. Internally, the algorithm successively increases the support threshold, because the value of predictive accuracy depends on it.Tertiusfinds rules according to a confirmation measure (Flach and Lachiche 1999), seeking rules with multiple conditions in the consequent, like Apriori, but differing in that these conditions are OR’d together, not ANDed. It can be set to find rules that predict a single condition or a predetermined attribute (i.e., classification rules). One parame- ter determines whether negation is allowed in the antecedent, the consequent, or both; others give the number of rules sought, minimum degree of confirmation, minimum coverage, maximum proportion of counterinstances, and maximum rule size. Missing values can match any value, never match, or be sig- nificant and possibly appear in rules.

10.8 Attribute selection

Figure 10.21 shows that part of Weka’s attribute selection panel where you specify the attribute evaluator and search method; Table 10.9 and Table 10.10 list the choices. Attribute selection is normally done by searching the space of attribute subsets, evaluating each one (Section 7.1). This is achieved by com- bining one of the four attribute subset evaluators in Table 10.9 with one of the seven search methods in Table 10.10. A potentially faster but less accurate approach is to evaluate the attributes individually and sort them, discarding

420 CHAPTER 10 | THE EXPLORER

Figure 10.21Attribute selection: specifying an evaluator and a search method.

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

10.8 Attribute selection

Get our desktop app

Company

Features

Documentation

Resources