Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Boosting

AdaBoostM1implements the algorithm described in Section 7.5 (page 321; Figure 7.7). It can be accelerated by specifying a threshold for weight pruning. AdaBoostM1resamples if the base classifier cannot handle weighted instances (you can also force resampling anyway).MultiBoostABcombines boosting with a variant of bagging to prevent overfitting (Webb 2000). Whereas boosting only applies to nominal classes,AdditiveRegressionen- hances the performance of a regression learner (Section 7.5, page 325). There are two parameters: shrinkage, which governs the learning rate, and the maximum number of models to generate. If the latter is infinite, work continues until the error stops decreasing. Decoratebuilds ensembles of diverse classifiers by using specially constructed artificial training examples. This technique is claimed to consistently improve on the base classifier and on the bagging and random forest metalearners (Melville and Mooney, 2005).^6 It outperforms boosting on small training sets and rivals it on larger ones. One parameter is the number of artificial examples to use as a proportion of the training data. Another is the desired number of classifiers in the ensemble, although execution may terminate prematurely because the number of iterations can also be capped. Larger ensembles usually produce more accurate models but have greater training time and model com- plexity. LogitBoostperforms additive logistic regression (Section 7.5, page 327). Like AdaBoostM1,it can be accelerated by specifying a threshold for weight pruning. The appropriate number of iterations can be determined using internal cross-validation; there is a shrinkage parameter that can be tuned to prevent overfitting; and you can choose resampling instead of reweighting.RacedIncre- mentalLogitBoostlearns by racing LogitBoosted committees, and operates incre- mentally by processing the data in batches (pages 347–348), making it useful for large datasets (Frank et al. 2002). Each committee member is learned from a different batch. The batch size starts at a given minimum and repeatedly doubles until it reaches a preset maximum. Resampling is used if the base classifier cannot handle weighted instances (you can also force resampling anyway). Log- likelihood pruning can be used within each committee: this discards new committee members if they decrease the log-likelihood based on the validation data. You can determine how many instances to hold out for validation. The validation data is also used to determine which committee to retain when training terminates.

416 CHAPTER 10 | THE EXPLORER

(^6) The random forest scheme was mentioned on page 407. It is really a metalearner, but Weka
includes it among the decision tree methods because it is hardwired to a particular classi-
fier,RandomTree.

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Boosting

Get our desktop app

Company

Features

Documentation

Resources