Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

372 Feature Selection and Generation


LetDbe the distribution over [m] defined by

Di=

exp(−yifw(xi))
Z

,

whereZis a normalization factor that ensures thatDis a probability
vector. Show that
∂R(w)
wj

=−

∑m

i=1

Diyihj(xi).

Furthermore, denotingj=

∑m
i=1Di^1 [hj(xi)^6 =yi], show that
∂R(w)
wj

= 2j− 1.

Conclude that ifj≤ 1 / 2 −γthen

∣∣

∣∂Rw(jw)

∣∣

∣≥γ/2.


  • Show that the update of AdaBoost guarantees R(w(t+1))−R(w(t)) ≤
    log(



1 − 4 γ^2 ).Hint: Use the proof of Theorem 10.2.
Free download pdf