Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
To evaluate the same classifier on a new batch of test data, you load it back using
-linstead of rebuilding it. If the classifier can be updated incrementally, you
can provide both a training file and an input file, and Weka will load the clas-
sifier and update it with the given training instances.
If you wish only to assess the performance of a learning scheme, use -oto
suppress output of the model. Use -ito see the performance measures of pre-
cision, recall, and F-measure (Section 5.7). Use -kto compute information-
theoretical measures from the probabilities derived by a learning scheme
(Section 5.6).
Weka users often want to know which class values the learning scheme actu-
ally predicts for each test instance. The -poption prints each test instance’s
number, its class, the confidence of the scheme’s prediction, and the predicted
class value. It also outputs attribute values for each instance and must be fol-
lowed by a specification of the range (e.g., 1–2)—use 0 if you don’t want any
attribute values. You can also output the cumulative margin distributionfor the
training data, which shows how the distribution of the margin measure (Section
7.5, page 324) changes with the number of boosting iterations. Finally, you can
output the classifier’s source representation, and a graphical representation if
the classifier can produce one.

Scheme-specific options


Table 13.2 shows the options specific to J4.8. You can force the algorithm to use
the unpruned tree instead of the pruned one. You can suppress subtree raising,
which increases efficiency. You can set the confidence threshold for pruning and
the minimum number of instances permissible at any leaf—both parameters
were described in Section 6.1 (page 199). As well as C4.5’s standard pruning

458 CHAPTER 13 | THE COMMAND-LINE INTERFACE


Table 13.2 Scheme-specific options for the J4.8 decision tree learner.

Option Function


-U Use unpruned tree
-C Specify confidence threshold for pruning
-M Specify minimum number of instances in any leaf
-R Use reduced-error pruning
-N Specify number of folds for reduced-error pruning; use one fold as
pruning set
-B Use binary splits only
-S Don’t perform subtree raising
-L Retain instance information
-A Smooth the probability estimates using Laplace smoothing
-Q Seed for shuffling data

Free download pdf