Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
10.2 AdaBoost 135

a distribution over the examples inS, denotedD(t). That is,D(t)∈Rm+and
∑m
i=1D


(t)
i = 1. Then, the booster passes the distributionD(t)and the sampleS
to the weak learner. (That way, the weak learner can construct i.i.d. examples
according toD(t) andf.) The weak learner is assumed to return a “weak”
hypothesis,ht, whose error,


tdef=LD(t)(ht)def=

∑m

i=1

D(it) (^1) [ht(xi) 6 =yi],
is at most^12 −γ(of course, there is a probability of at mostδthat the weak learner
fails). Then, AdaBoost assigns a weight forhtas follows:wt=^12 log


(

1
t−^1

)

.

That is, the weight ofhtis inversely proportional to the error ofht. At the end
of the round, AdaBoost updates the distribution so that examples on whichht
errs will get a higher probability mass while examples on whichhtis correct will
get a lower probability mass. Intuitively, this will force the weak learner to focus
on the problematic examples in the next round. The output of the AdaBoost
algorithm is a “strong” classifier that is based on a weighted sum of all the weak
hypotheses. The pseudocode of AdaBoost is presented in the following.


AdaBoost

input:
training setS= (x 1 ,y 1 ),...,(xm,ym)
weak learner WL
number of roundsT
initialize D(1)= (m^1 ,...,m^1 ).
fort= 1,...,T:
invoke weak learnerht= WL(D(t),S)
computet=

∑m
i=1D

(t)
i^1 [yi^6 =ht(xi)]
letwt=^12 log

(

1
t−^1

)

updateD(it+1)= D

(it)exp(−wtyiht(xi))
∑m
j=1D(jt)exp(−wtyjht(xj))

for alli= 1,...,m

outputthe hypothesishs(x) = sign

(∑T

t=1wtht(x)

)

.

The following theorem shows that the training error of the output hypothesis
decreases exponentially fast with the number of boosting rounds.


theorem10.2 LetSbe a training set and assume that at each iteration of
AdaBoost, the weak learner returns a hypothesis for whicht≤ 1 / 2 −γ. Then,
the training error of the output hypothesis of AdaBoost is at most


LS(hs) =

1

m

∑m

i=1

(^1) [hs(xi) 6 =yi] ≤ exp(− 2 γ^2 T).
Proof For eacht, denoteft=



p≤twphp. Therefore, the output of AdaBoost
Free download pdf