Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

136 Boosting


isfT. In addition, denote

Zt=

1

m

∑m

i=1

e−yift(xi).

Note that for any hypothesis we have that (^1) [h(x) 6 =y]≤e−yh(x). Therefore,LS(fT)≤
ZT, so it suffices to show thatZT≤e−^2 γ
(^2) T


. To upper boundZTwe rewrite it
as
ZT=


ZT

Z 0 =

ZT

ZT− 1 ·

ZT− 1

ZT− 2 ···

Z 2

Z 1 ·

Z 1

Z 0 , (10.2)

where we used the fact thatZ 0 = 1 becausef 0 ≡0. Therefore, it suffices to show
that for every roundt,
Zt+1
Zt

≤e−^2 γ

2

. (10.3)


To do so, we first note that using a simple inductive argument, for alltandi,

D(it+1)= e

−yift(xi)
∑m
j=1e−yjft(xj)

.

Hence,
Zt+1
Zt

=

∑m
i=1e−yift+1(xi)
∑m
j=1

e−yjft(xj)

=

∑m
i=1e

−yift(xi)e−yiwt+1ht+1(xi)
∑m
j=1

e−yjft(xj)

=

∑m

i=1

D(it+1)e−yiwt+1ht+1(xi)

=e−wt+1


i:yiht+1(xi)=1

D(it+1)+ewt+1


i:yiht+1(xi)=− 1

Di(t+1)

=e−wt+1(1−t+1) +ewt+1t+1

=

1


1 /t+1− 1

(1−t+1) +


1 /t+1− 1 t+1

=


t+1
1 −t+1

(1−t+1) +


1 −t+1
t+1

t+1

= 2


t+1(1−t+1).

By our assumption,t+1≤^12 −γ. Since the functiong(a) =a(1−a) is mono-
tonically increasing in [0, 1 /2], we obtain that

2


t+1(1−t+1)≤ 2

√(

1

2

−γ

)(

1

2


)

=


1 − 4 γ^2.
Free download pdf