21.2 Online Classification in the Unrealizable Case 297Summing this inequality overtwe get
log(ZT+1)−log(Z 1 ) =∑T
t=1logZt+1
Zt≤−η∑T
t=1〈w(t),vt〉+T η^2
2. (21.3)
Next, we lower boundZT+1. For eachi, we can rewrite ̃w(iT+1)=e−η
∑
tvt,iand
we get that
logZT+1= log(
∑
ie−η∑
tvt,i)
≥log(
maxi e−η∑
tvt,i)
=−ηmini∑
tvt,i.Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
log(d) we get that
−ηmini∑
tvt,i−log(d) ≤ −η∑T
t=1〈w(t),vt〉+T η^2
2 ,which can be rearranged as follows:
∑Tt=1〈w(t),vt〉−min
i∑
tvt,i ≤ log(d)
η+η T
2.
Plugging the value ofηinto the equation concludes our proof.
Proof of Theorem 21.10
Equipped with theWeighted-Majorityalgorithm and Theorem 21.11, we are
ready to prove Theorem 21.10. We start with the simpler case, in whichHis
a finite class, and let us writeH={h 1 ,...,hd}. In this case, we can refer to
each hypothesis,hi, as an expert, whose advice is to predicthi(xt), and whose
cost isvt,i=|hi(xt)−yt|. The prediction of the algorithm will therefore be
pt=
∑
iw(t)
i hi(xt)∈[0,1], and the loss is|pt−yt|=∣∣
∣∣
∣
∑di=1w(it)hi(xt)−yt∣∣
∣∣
∣
=
∣∣
∣∣
∣
∑di=1w(it)(hi(xt)−yt)∣∣
∣∣
∣
.
Now, if∑ yt= 1, then for alli,hi(xt)−yt≤0. Therefore, the above equals to
iw(t)
i |hi(xt)−yt|. Ifyt= 0 then for alli,hi(xt)−yt≥0, and the above also
equals
∑
iw(t)
i |hi(xt)−yt|. All in all, we have shown that|pt−yt|=∑di=1w(it)|hi(xt)−yt|=〈w(t),vt〉.Furthermore, for eachi,
∑
tvt,iis exactly the number of mistakes hypothesishi
makes. Applying Theorem 21.11 we obtain