Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

To evaluate the same classifier on a new batch of test data, you load it back using -linstead of rebuilding it. If the classifier can be updated incrementally, you can provide both a training file and an input file, and Weka will load the classifier and update it with the given training instances. If you wish only to assess the performance of a learning scheme, use -oto suppress output of the model. Use -ito see the performance measures of pre- cision, recall, and F-measure (Section 5.7). Use -kto compute information- theoretical measures from the probabilities derived by a learning scheme (Section 5.6). Weka users often want to know which class values the learning scheme actu- ally predicts for each test instance. The -poption prints each test instance’s number, its class, the confidence of the scheme’s prediction, and the predicted class value. It also outputs attribute values for each instance and must be fol- lowed by a specification of the range (e.g., 1–2)—use 0 if you don’t want any attribute values. You can also output the cumulative margin distributionfor the training data, which shows how the distribution of the margin measure (Section 7.5, page 324) changes with the number of boosting iterations. Finally, you can output the classifier’s source representation, and a graphical representation if the classifier can produce one.

Scheme-specific options

Table 13.2 shows the options specific to J4.8. You can force the algorithm to use the unpruned tree instead of the pruned one. You can suppress subtree raising, which increases efficiency. You can set the confidence threshold for pruning and the minimum number of instances permissible at any leaf—both parameters were described in Section 6.1 (page 199). As well as C4.5’s standard pruning

458 CHAPTER 13 | THE COMMAND-LINE INTERFACE

Table 13.2 Scheme-specific options for the J4.8 decision tree learner.

Option Function

-U Use unpruned tree
-C Specify confidence threshold for pruning
-M Specify minimum number of instances in any leaf
-R Use reduced-error pruning
-N Specify number of folds for reduced-error pruning; use one fold as
pruning set
-B Use binary splits only
-S Don’t perform subtree raising
-L Retain instance information
-A Smooth the probability estimates using Laplace smoothing
-Q Seed for shuffling data

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Scheme-specific options

Get our desktop app

Company

Features

Documentation

Resources