Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
21.2 Online Classification in the Unrealizable Case 297

Summing this inequality overtwe get


log(ZT+1)−log(Z 1 ) =

∑T

t=1

log

Zt+1
Zt

≤−η

∑T

t=1

〈w(t),vt〉+

T η^2
2

. (21.3)

Next, we lower boundZT+1. For eachi, we can rewrite ̃w(iT+1)=e−η



tvt,iand
we get that


logZT+1= log

(


i

e−η


tvt,i

)

≥log

(

maxi e−η


tvt,i

)

=−ηmini


t

vt,i.

Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
log(d) we get that


−ηmini


t

vt,i−log(d) ≤ −η

∑T

t=1

〈w(t),vt〉+

T η^2
2 ,

which can be rearranged as follows:


∑T

t=1

〈w(t),vt〉−min
i


t

vt,i ≤ log(d)
η

+η T
2

.

Plugging the value ofηinto the equation concludes our proof.


Proof of Theorem 21.10


Equipped with theWeighted-Majorityalgorithm and Theorem 21.11, we are
ready to prove Theorem 21.10. We start with the simpler case, in whichHis
a finite class, and let us writeH={h 1 ,...,hd}. In this case, we can refer to
each hypothesis,hi, as an expert, whose advice is to predicthi(xt), and whose
cost isvt,i=|hi(xt)−yt|. The prediction of the algorithm will therefore be
pt=



iw

(t)
i hi(xt)∈[0,1], and the loss is

|pt−yt|=

∣∣

∣∣


∑d

i=1

w(it)hi(xt)−yt

∣∣

∣∣


=

∣∣

∣∣


∑d

i=1

w(it)(hi(xt)−yt)

∣∣

∣∣


.

Now, if∑ yt= 1, then for alli,hi(xt)−yt≤0. Therefore, the above equals to


iw

(t)
i |hi(xt)−yt|. Ifyt= 0 then for alli,hi(xt)−yt≥0, and the above also
equals



iw

(t)
i |hi(xt)−yt|. All in all, we have shown that

|pt−yt|=

∑d

i=1

w(it)|hi(xt)−yt|=〈w(t),vt〉.

Furthermore, for eachi,



tvt,iis exactly the number of mistakes hypothesishi
makes. Applying Theorem 21.11 we obtain

Free download pdf