Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Section 6.1, page 190). It deals with missing values by splitting instances into pieces, as C4.5 does. You can set the minimum number of instances per leaf, maximum tree depth (useful when boosting trees), minimum proportion of training set variance for a split (numeric classes only), and number of folds for pruning. NBTreeis a hybrid between decision trees and Naïve Bayes. It creates trees whose leaves are Naïve Bayes classifiers for the instances that reach the leaf. When constructing the tree, cross-validation is used to decide whether a node should be split further or a Naïve Bayes model should be used instead (Kohavi 1996). M5Pis the model tree learner described in Section 6.5.LMTbuilds logistic model trees (Section 7.5, page 331).LMTcan deal with binary and multiclass target variables, numeric and nominal attributes, and missing values. When fitting the logistic regression functions at a node, it uses cross-validation to determine how many iterations to run just once and employs the same number throughout the tree instead of cross-validating at every node. This heuristic (which you can switch off ) improves the run time considerably, with little effect on accuracy. Alternatively, you can set the number of boosting iterations to be used throughout the tree. Normally, it is the misclassification error that cross- validation minimizes, but the root mean-squared error of the probabilities can be chosen instead. The splitting criterion can be based on C4.5’s information gain (the default) or on the LogitBoost residuals, striving to improve the purity of the residuals. ADTreebuilds an alternating decision tree using boosting (Section 7.5, pages 329–331) and is optimized for two-class problems. The number of boosting iterations is a parameter that can be tuned to suit the dataset and the desired complexity–accuracy tradeoff. Each iteration adds three nodes to the tree (one split node and two prediction nodes) unless nodes can be merged. The default search method is exhaustive search (Expand all paths); the others are heuristics and are much faster. You can determine whether to save instance data for visualization.

Rules

Table 10.5 shows many methods for generating rules.DecisionTablebuilds a decision table majority classifier (Section 7.1, page 295). It evaluates feature subsets using best-first search and can use cross-validation for evaluation (Kohavi 1995b). An option uses the nearest-neighbor method to determine the class for each instance that is not covered by a decision table entry, instead of the table’s global majority, based on the same set of features.OneRis the 1R classifier (Section 4.1) with one parameter: the minimum bucket size for dis- cretization.ConjunctiveRulelearns a single rule that predicts either a numeric or a nominal class value. Uncovered test instances are assigned the default class

408 CHAPTER 10 | THE EXPLORER

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Rules

Get our desktop app

Company

Features

Documentation

Resources