Understanding Machine Learning: From Theory to Algorithms

372 Feature Selection and Generation

LetDbe the distribution over [m] defined by

Di=

exp(−yifw(xi)) Z

,

whereZis a normalization factor that ensures thatDis a probability vector. Show that ∂R(w) wj

=−

∑m

i=1

Diyihj(xi).

Furthermore, denotingj=

∑m i=1Di^1 [hj(xi)^6 =yi], show that ∂R(w) wj

= 2j− 1.

Conclude that ifj≤ 1 / 2 −γthen

∣∣

∣∂Rw(jw)

∣∣

∣≥γ/2.

Show that the update of AdaBoost guarantees R(w(t+1))−R(w(t)) ≤
log(

√

1 − 4 γ^2 ).Hint: Use the proof of Theorem 10.2.