Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

7.5 COMBINING MULTIPLE MODELS 319


reduces the expected value of the mean-squared error. (As we mentioned earlier,
the analogous result is not true for classification.)


Bagging with costs


Bagging helps most if the underlying learning method is unstable in that small
changes in the input data can lead to quite different classifiers. Indeed it can
help to increase the diversity in the ensemble of classifiers by making the learn-
ing method as unstable as possible. For example, when bagging decision trees,
which are already unstable, better performance is often achieved by switching
pruning off, which makes them even more unstable. Another improvement can
be obtained by changing the way that predictions are combined for classifica-
tion. As originally formulated, bagging uses voting. But when the models can
output probability estimates and not just plain classifications, it makes intuitive
sense to average these probabilities instead. Not only does this often improve
classification slightly, but the bagged classifier also generates probability
estimates—ones that are often more accurate than those produced by the in-
dividual models. Implementations of bagging commonly use this method of
combining predictions.
In Section 5.7 we showed how to make a classifier cost sensitive by mini-
mizing the expected cost of predictions. Accurate probability estimates are nec-
essary because they are used to obtain the expected cost of each prediction.
Bagging is a prime candidate for cost-sensitive classification because it produces
very accurate probability estimates from decision trees and other powerful, yet
unstable, classifiers. However, a disadvantage is that bagged classifiers are hard
to analyze.
A method called MetaCostcombines the predictive benefits of bagging with
a comprehensible model for cost-sensitive prediction. It builds an ensemble
classifier using bagging and uses it to relabel the training data by giving every


model generation
Let n be the number of instances in the training data.
For each of t iterations:
Sample n instances with replacement from training data.
Apply the learning algorithm to the sample.
Store the resulting model.
classification
For each of the t models:
Predict class of instance using model.
Return class that has been predicted most often.

Figure 7.7Algorithm for bagging.

Free download pdf