Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
method, classes to clusters evaluation, compares how well the chosen clusters
match a preassigned class in the data. You select an attribute (which must be
nominal) that represents the “true” class. Having clustered the data, Weka
determines the majority class in each cluster and prints a confusion matrix
showing how many errors there would be if the clusters were used instead of
the true class. If your dataset has a class attribute, you can ignore it during clus-
tering by selecting it from a pull-down list of attributes, and see how well the
clusters correspond to actual class values. Finally, you can choose whether or
not to store the clusters for visualization. The only reason not to do so is to con-
serve space. As with classifiers, you visualize the results by right-clicking on the
result list, which allows you to view two-dimensional scatter plots like the one
in Figure 10.6(b). If you have chosen classes to clusters evaluation, the class
assignment errors are shown. For the Cobwebclustering scheme, you can also
visualize the tree.
The Associatepanel is simpler than Classifyor Cluster. Weka contains only
three algorithms for determining association rules and no methods for evalu-
ating such rules. Figure 10.15 shows the output from the Apriori program for
association rules (described in Section 4.5) on the nominal version of the
weather data. Despite the simplicity of the data, several rules are found. The
number before the arrow is the mumber of instances for which the antecedent
is true; that after the arrow is the number of instances in which the consequent
is true also; and the confidence (in parentheses) is the ratio between the two.
Ten rules are found by default: you can ask for more by using the object editor
to change numRules.

Attribute selection

The Select attributespanel gives access to several methods for attribute selection.
As explained in Section 7.1, this involves an attribute evaluator and a search

392 CHAPTER 10 | THE EXPLORER



  1. outlook=overcast 4 ==> play=yes 4 conf:(1)

  2. temperature=cool 4 ==> humidity=normal 4 conf:(1)

  3. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1)

  4. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1)

  5. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1)

  6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1)

  7. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1)

  8. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1)

  9. outlook=sunny temperature=hot 2 ==> humidity=high 2 conf:(1)

  10. temperature=hot play=no 2 ==> outlook=sunny 2 conf:(1)


Figure 10.15Output from the Apriori program for association rules.
Free download pdf