Understanding Machine Learning: From Theory to Algorithms

292 Online Learning

x=j. Then, it is easy to show that Ldim(H) = 1 while|H|=dcan be arbitrarily large. Therefore, this example shows that Ldim(H) can be significantly smaller than log 2 (|H|).

Example 21.4 LetX= [0,1] andH={x7→ (^1) [x<a]:a∈[0,1]}; namely,His
the class of thresholds on the interval [0,1]. Then, Ldim(H) =∞. To see this,
consider the tree
1 / 2

1 / 4

1 / 8 3 / 8

3 / 4

5 / 8 7 / 8

This tree is shattered byH. And, because of the density of the reals, this tree can be made arbitrarily deep. Lemma 21.6 states that Ldim(H) lower bounds the mistake bound of any algorithm. Interestingly, there is a standard algorithm whose mistake bound matches this lower bound. The algorithm is similar to theHalvingalgorithm. Recall that the prediction ofHalvingis made according to a majority vote of the hypotheses which are consistent with previous examples. We denoted this set byVt. Put another way,HalvingpartitionsVtinto two sets:Vt+={h∈Vt: h(xt) = 1}andVt−={h∈Vt:h(xt) = 0}. It then predicts according to the larger of the two groups. The rationale behind this prediction is that whenever Halvingmakes a mistake it ends up with|Vt+1|≤ 0. 5 |Vt|. The optimal algorithm we present in the following uses the same idea, but instead of predicting according to the larger class, it predicts according to the class with larger Ldim.

Standard Optimal Algorithm (SOA)

input:A hypothesis classH initialize:V 1 =H fort= 1, 2 ,... receivext forr∈{ 0 , 1 }letVt(r)={h∈Vt:h(xt) =r} predictpt= argmaxr∈{ 0 , 1 }Ldim(Vt(r)) (in case of a tie predictpt= 1) receive true labelyt updateVt+1={h∈Vt:h(xt) =yt}

The following lemma formally establishes the optimality of the preceding algorithm.

Understanding Machine Learning: From Theory to Algorithms

1 / 4

1 / 8 3 / 8

3 / 4

5 / 8 7 / 8

Get our desktop app

Company

Features

Documentation

Resources