Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
21.1 Online Classification in the Realizable Case 293

lemma21.7 SOAenjoys the mistake boundMSOA(H)≤Ldim(H).


Proof It suffices to prove that whenever the algorithm makes a prediction mis-
take we have Ldim(Vt+1)≤Ldim(Vt)−1. We prove this claim by assuming the
contrary, that is, Ldim(Vt+1) = Ldim(Vt). If this holds true, then the definition
ofptimplies that Ldim(Vt(r)) = Ldim(Vt) for bothr= 1 andr= 0. But, then
we can construct a shaterred tree of depth Ldim(Vt) + 1 for the classVt, which
leads to the desired contradiction.


Combining Lemma 21.7 and Lemma 21.6 we obtain:

corollary21.8 LetHbe any hypothesis class. Then, the standard optimal
algorithm enjoys the mistake boundMSOA(H) = Ldim(H)and no other algorithm
can haveMA(H)<Ldim(H).


Comparison to VC Dimension


In the PAC learning model, learnability is characterized by the VC dimension of
the classH. Recall that the VC dimension of a classHis the maximal number
dsuch that there are instancesx 1 ,...,xdthat are shattered byH. That is, for
any sequence of labels (y 1 ,...,yd)∈ { 0 , 1 }dthere exists a hypothesish∈ H
that gives exactly this sequence of labels. The following theorem relates the VC
dimension to the Littlestone dimension.


theorem21.9 For any classH,VCdim(H)≤Ldim(H), and there are classes
for which strict inequality holds. Furthermore, the gap can be arbitrarily larger.


Proof We first prove that VCdim(H)≤Ldim(H). Suppose VCdim(H) =dand
letx 1 ,...,xdbe a shattered set. We now construct a complete binary tree of
instancesv 1 ,...,v 2 d− 1 , where all nodes at depthiare set to bexi– see the
following illustration:


x 1

x 2

x 3 x 3

x 2

x 3 x 3

Now, the definition of a shattered set clearly implies that we got a valid shattered
tree of depthd, and we conclude that VCdim(H)≤Ldim(H). To show that the
gap can be arbitrarily large simply note that the class given in Example 21.4 has
VC dimension of 1 whereas its Littlestone dimension is infinite.

Free download pdf