Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
10.3 Linear Combinations of Base Hypotheses 139

From this example we obtain thatL(HDS1,T) can shatter any set ofT+ 1
instances inR; hence the VC-dimension ofL(HDS1,T) is at leastT+1. Therefore,
T is a parameter that can control the bias-complexity tradeoff: EnlargingT
yields a more expressive hypothesis class but on the other hand might increase
the estimation error. In the next subsection we formally upper bound the VC-
dimension ofL(B,T) for any base classB.

10.3.1 The VC-Dimension ofL(B,T)


The following lemma tells us that the VC-dimension ofL(B,T) is upper bounded
byO ̃(VCdim(B)T) (theO ̃notation ignores constants and logarithmic factors).

lemma10.3 LetBbe a base class and letL(B,T)be as defined in Equa-
tion (10.4). Assume that bothTandVCdim(B)are at least 3. Then,

VCdim(L(B,T))≤T(VCdim(B) + 1) (3 log(T(VCdim(B) + 1)) + 2).

Proof Denoted = VCdim(B). LetC ={x 1 ,...,xm}be a set that is shat-
tered byL(B,T). Each labeling ofCbyh∈L(B,T) is obtained by first choos-
ingh 1 ,...,hT∈Band then applying a halfspace hypothesis over the vector
(h 1 (x),...,hT(x)). By Sauer’s lemma, there are at most (em/d)ddifferent di-
chotomies (i.e., labelings) induced byBoverC. Therefore, we need to choose
Thypotheses, out of at most (em/d)ddifferent hypotheses. There are at most
(em/d)dTways to do it. Next, for each such choice, we apply a linear predictor,
which yields at most (em/T)T dichotomies. Therefore, the overall number of
dichotomies we can construct is upper bounded by

(em/d)dT(em/T)T≤m(d+1)T,

where we used the assumption that bothdandTare at least 3. Since we assume
thatCis shattered, we must have that the preceding is at least 2m, which yields

2 m≤m(d+1)T.

Therefore,

m≤log(m)

(d+ 1)T
log(2)

.

Lemma A.1 in Chapter A tells us that a necessary condition for the above to
hold is that

m≤ 2

(d+ 1)T
log(2)

log

(d+ 1)T
log(2)

≤(d+ 1)T(3 log((d+ 1)T) + 2),

which concludes our proof.

In Exercise 4 we show that for some base classes,B, it also holds that VCdim(L(B,T))≥
Ω(VCdim(B)T).
Free download pdf