21.2 Online Classification in the Unrealizable Case 297
Summing this inequality overtwe get
log(ZT+1)−log(Z 1 ) =
∑T
t=1
log
Zt+1
Zt
≤−η
∑T
t=1
〈w(t),vt〉+
T η^2
2
. (21.3)
Next, we lower boundZT+1. For eachi, we can rewrite ̃w(iT+1)=e−η
∑
tvt,iand
we get that
logZT+1= log
(
∑
i
e−η
∑
tvt,i
)
≥log
(
maxi e−η
∑
tvt,i
)
=−ηmini
∑
t
vt,i.
Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
log(d) we get that
−ηmini
∑
t
vt,i−log(d) ≤ −η
∑T
t=1
〈w(t),vt〉+
T η^2
2 ,
which can be rearranged as follows:
∑T
t=1
〈w(t),vt〉−min
i
∑
t
vt,i ≤ log(d)
η
+η T
2
.
Plugging the value ofηinto the equation concludes our proof.
Proof of Theorem 21.10
Equipped with theWeighted-Majorityalgorithm and Theorem 21.11, we are
ready to prove Theorem 21.10. We start with the simpler case, in whichHis
a finite class, and let us writeH={h 1 ,...,hd}. In this case, we can refer to
each hypothesis,hi, as an expert, whose advice is to predicthi(xt), and whose
cost isvt,i=|hi(xt)−yt|. The prediction of the algorithm will therefore be
pt=
∑
iw
(t)
i hi(xt)∈[0,1], and the loss is
|pt−yt|=
∣∣
∣∣
∣
∑d
i=1
w(it)hi(xt)−yt
∣∣
∣∣
∣
=
∣∣
∣∣
∣
∑d
i=1
w(it)(hi(xt)−yt)
∣∣
∣∣
∣
.
Now, if∑ yt= 1, then for alli,hi(xt)−yt≤0. Therefore, the above equals to
iw
(t)
i |hi(xt)−yt|. Ifyt= 0 then for alli,hi(xt)−yt≥0, and the above also
equals
∑
iw
(t)
i |hi(xt)−yt|. All in all, we have shown that
|pt−yt|=
∑d
i=1
w(it)|hi(xt)−yt|=〈w(t),vt〉.
Furthermore, for eachi,
∑
tvt,iis exactly the number of mistakes hypothesishi
makes. Applying Theorem 21.11 we obtain