372 Feature Selection and Generation
LetDbe the distribution over [m] defined by
Di=
exp(−yifw(xi))
Z
,
whereZis a normalization factor that ensures thatDis a probability
vector. Show that
∂R(w)
wj
=−
∑m
i=1
Diyihj(xi).
Furthermore, denotingj=
∑m
i=1Di^1 [hj(xi)^6 =yi], show that
∂R(w)
wj
= 2j− 1.
Conclude that ifj≤ 1 / 2 −γthen
∣∣
∣∂Rw(jw)
∣∣
∣≥γ/2.
- Show that the update of AdaBoost guarantees R(w(t+1))−R(w(t)) ≤
log(
√
1 − 4 γ^2 ).Hint: Use the proof of Theorem 10.2.