Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

Boosting


AdaBoostM1implements the algorithm described in Section 7.5 (page 321;
Figure 7.7). It can be accelerated by specifying a threshold for weight pruning.
AdaBoostM1resamples if the base classifier cannot handle weighted instances
(you can also force resampling anyway).MultiBoostABcombines boosting with
a variant of bagging to prevent overfitting (Webb 2000).
Whereas boosting only applies to nominal classes,AdditiveRegressionen-
hances the performance of a regression learner (Section 7.5, page 325). There are
two parameters: shrinkage, which governs the learning rate, and the maximum
number of models to generate. If the latter is infinite, work continues until the
error stops decreasing.
Decoratebuilds ensembles of diverse classifiers by using specially constructed
artificial training examples. This technique is claimed to consistently improve
on the base classifier and on the bagging and random forest metalearners
(Melville and Mooney, 2005).^6 It outperforms boosting on small training sets
and rivals it on larger ones. One parameter is the number of artificial examples
to use as a proportion of the training data. Another is the desired number of
classifiers in the ensemble, although execution may terminate prematurely
because the number of iterations can also be capped. Larger ensembles usually
produce more accurate models but have greater training time and model com-
plexity.
LogitBoostperforms additive logistic regression (Section 7.5, page 327). Like
AdaBoostM1,it can be accelerated by specifying a threshold for weight pruning.
The appropriate number of iterations can be determined using internal
cross-validation; there is a shrinkage parameter that can be tuned to prevent
overfitting; and you can choose resampling instead of reweighting.RacedIncre-
mentalLogitBoostlearns by racing LogitBoosted committees, and operates incre-
mentally by processing the data in batches (pages 347–348), making it useful for
large datasets (Frank et al. 2002). Each committee member is learned from a
different batch. The batch size starts at a given minimum and repeatedly doubles
until it reaches a preset maximum. Resampling is used if the base classifier
cannot handle weighted instances (you can also force resampling anyway). Log-
likelihood pruning can be used within each committee: this discards new
committee members if they decrease the log-likelihood based on the validation
data. You can determine how many instances to hold out for validation. The val-
idation data is also used to determine which committee to retain when training
terminates.

416 CHAPTER 10 | THE EXPLORER


(^6) The random forest scheme was mentioned on page 407. It is really a metalearner, but Weka
includes it among the decision tree methods because it is hardwired to a particular classi-
fier,RandomTree.

Free download pdf