372 Feature Selection and Generation
LetDbe the distribution over [m] defined byDi=exp(−yifw(xi))
Z,
whereZis a normalization factor that ensures thatDis a probability
vector. Show that
∂R(w)
wj=−
∑mi=1Diyihj(xi).Furthermore, denotingj=∑m
i=1Di^1 [hj(xi)^6 =yi], show that
∂R(w)
wj= 2j− 1.Conclude that ifj≤ 1 / 2 −γthen∣∣
∣∂Rw(jw)∣∣
∣≥γ/2.- Show that the update of AdaBoost guarantees R(w(t+1))−R(w(t)) ≤
log(
√
1 − 4 γ^2 ).Hint: Use the proof of Theorem 10.2.