Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

7.5 COMBINING MULTIPLE MODELS 327


of iterations needed to arrive at a good additive model. Reducing the multiplier
effectively damps down the learning process, increasing the chance of stopping
at just the right moment—but also increasing run time.

Additive logistic regression


Additive regression can also be applied to classification just as linear regression
can. But we know from Section 4.6 that logistic regression outperforms linear
regression for classification. It turns out that a similar adaptation can be made
to additive models by modifying the forward stagewise modeling method to
perform additivelogistic regression. Use the logit transform to translate the
probability estimation problem into a regression problem, as we did in Section
4.6, and solve the regression task using an ensemble of models—for example,
regression trees—just as for additive regression. At each stage, add the model
that maximizes the probability of the data given the ensemble classifier.
Suppose fjis the jth regression model in the ensemble and fj(a) is its predic-
tion for instance a.Assuming a two-class problem, use the additive model Sfj(a)
to obtain a probability estimate for the first class:

This closely resembles the expression used in Section 4.6 (page 121), except that
here it is abbreviated by using vector notation for the instance aand the origi-
nal weighted sum of attribute values is replaced by a sum of arbitrarily complex
regression models f.
Figure 7.9 shows the two-class version of the LogitBoostalgorithm, which per-
forms additive logistic regression and generates the individual models fj.Here,
yiis 1 for an instance in the first class and 0 for an instance in the second. In
each iteration this algorithm fits a regression model fjto a weighted version of

p
e fj

1

1
1

(a)= a
+ -Â ()

model generation
For j = 1 to t iterations:
For each instance a[i]:
Set the target value for the regression to
z[i] = (y[i] – p(1 | a[i])) / [p(1 | a[i])  (1 – p(1 | a[i])]
Set the weight of instance a[i] to p(1 | a[i])  (1 – p(1 | a[i])
Fit a regression model f[j] to the data with class values z[i] and weights w[i].
classification
Predict first class if p(1 | a) > 0.5, otherwise predict second class.

Figure 7.9Algorithm for additive logistic regression.

Free download pdf