Understanding Machine Learning: From Theory to Algorithms

21.2 Online Classification in the Unrealizable Case 297

Summing this inequality overtwe get

log(ZT+1)−log(Z 1 ) =

∑T

t=1

log

Zt+1 Zt

≤−η

∑T

t=1

〈w(t),vt〉+

T η^2 2

. (21.3)

Next, we lower boundZT+1. For eachi, we can rewrite ̃w(iT+1)=e−η

∑
tvt,iand
we get that

logZT+1= log

(

∑

i

e−η

∑ tvt,i

)

≥log

(

maxi e−η

∑ tvt,i

)

=−ηmini

∑

t

vt,i.

Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
log(d) we get that

−ηmini

∑

t

vt,i−log(d) ≤ −η

∑T

t=1

〈w(t),vt〉+

T η^2 2 ,

which can be rearranged as follows:

∑T

t=1

〈w(t),vt〉−min i

∑

t

vt,i ≤ log(d) η

+η T 2

.

Plugging the value ofηinto the equation concludes our proof.

Proof of Theorem 21.10

Equipped with theWeighted-Majorityalgorithm and Theorem 21.11, we are
ready to prove Theorem 21.10. We start with the simpler case, in whichHis
a finite class, and let us writeH={h 1 ,...,hd}. In this case, we can refer to
each hypothesis,hi, as an expert, whose advice is to predicthi(xt), and whose
cost isvt,i=|hi(xt)−yt|. The prediction of the algorithm will therefore be
pt=

∑

iw

(t) i hi(xt)∈[0,1], and the loss is

|pt−yt|=

∣∣

∣

∑d

i=1

w(it)hi(xt)−yt

∣∣

∣

=

∣∣

∣

∑d

i=1

w(it)(hi(xt)−yt)

∣∣

∣

.

Now, if∑ yt= 1, then for alli,hi(xt)−yt≤0. Therefore, the above equals to

iw

(t)
i |hi(xt)−yt|. Ifyt= 0 then for alli,hi(xt)−yt≥0, and the above also
equals

∑

iw

(t) i |hi(xt)−yt|. All in all, we have shown that

|pt−yt|=

∑d

i=1

w(it)|hi(xt)−yt|=〈w(t),vt〉.

Furthermore, for eachi,

∑

tvt,iis exactly the number of mistakes hypothesishi
makes. Applying Theorem 21.11 we obtain

Understanding Machine Learning: From Theory to Algorithms

∑T

∑T

. (21.3)

(

∑

)

(

)

∑

∑

∑T

∑

.

Proof of Theorem 21.10

∑

∣∣

∣∣

∣

∣∣

∣∣

∣

=

∣∣

∣∣

∣

∣∣

∣∣

∣

.

∑

∑

Get our desktop app

Company

Features

Documentation

Resources