Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

10.2 EXPLORING THE EXPLORER 391


Classifypanel and choose the classifier AdaboostM1from the metasection of the
hierarchical menu. When you configure this classifier by clicking it, the object
editor shown in Figure 10.14 appears. This has its own classifier field, which we
set to DecisionStump(as shown). This method could itself be configured by
clicking (except that DecisionStumphappens to have no editable properties).
Click OKto return to the main Classifypanel and Startto try out boosting deci-
sion stumps up to 10 times. It turns out that this mislabels only 7 of the 150
instances in the Iris data—good performance considering the rudimentary
nature of decision stumps and the rather small number of boosting iterations.


Clustering and association rules

Use the Clusterand Associatepanels to invoke clustering algorithms (Section
6.6) and methods for finding association rules (Section 4.5). When clustering,
Weka shows the number of clusters and how many instances each cluster con-
tains. For some algorithms the number of clusters can be specified by setting a
parameter in the object editor. For probabilistic clustering methods, Weka mea-
sures the log-likelihood of the clusters on the training data: the larger this quan-
tity, the better the model fits the data. Increasing the number of clusters
normally increases the likelihood, but may overfit.
The controls on the Clusterpanel are similar to those for Classify. You can
specify some of the same evaluation methods—use training set, supplied test
set, and percentage split (the last two are used with the log-likelihood). A further


Figure 10.14Configuring a metalearner for boosting decision stumps.

Free download pdf