Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

76 The VC-Dimension


To ensure that the preceding is at mostwe need that

m≥

2 dlog(m)
(δ)^2

+

2 dlog(2e/d)
(δ)^2

.

Standard algebraic manipulations (see Lemma A.2 in Appendix A) show that a
sufficient condition for the preceding to hold is that

m≥ 4

2 d
(δ)^2

log

(

2 d
(δ)^2

)

+

4 dlog(2e/d)
(δ)^2

.

Remark 6.4 The upper bound onmUCH we derived in the proof Theorem 6.7
is not the tightest possible. A tighter analysis that yields the bounds given in
Theorem 6.8 can be found in Chapter 28.

Proof of Theorem 6.11 *


We will start by showing that

E
S∼Dm

[

sup
h∈H

|LD(h)−LS(h)|

]


4 +


log(τH(2m))

2 m

. (6.4)

Since the random variable suph∈H|LD(h)−LS(h)|is nonnegative, the proof of
the theorem follows directly from the preceding using Markov’s inequality (see
Section B.1).
To bound the left-hand side of Equation (6.4) we first note that for every
h∈ H, we can rewriteLD(h) =ES′∼Dm[LS′(h)], whereS′=z 1 ′,...,zm′ is an
additional i.i.d. sample. Therefore,

S∼DEm

[

sup
h∈H

|LD(h)−LS(h)|

]

=S∼DEm

[

sup
h∈H

∣∣

∣S′∼DEmLS′(h)−LS(h)

∣∣


]

.

A generalization of the triangle inequality yields
∣∣
∣S′∼DEm[LS′(h)−LS(h)]

∣∣

∣≤S′∼DEm|LS′(h)−LS(h)|,

and the fact that supermum of expectation is smaller than expectation of supre-
mum yields

sup
h∈H
S′∼DEm|LS′(h)−LS(h)|≤S′∼DEmsup
h∈H

|LS′(h)−LS(h)|.

Formally, the previous two inequalities follow from Jensen’s inequality. Combin-
ing all we obtain

S∼DEm

[

sup
h∈H

|LD(h)−LS(h)|

]

≤S,S′E∼Dm

[

sup
h∈H

|LS′(h)−LS(h)|

]

=S,S′E∼Dm

[

sup
h∈H

1

m

∣∣

∣∣


∑m

i=1

(`(h,zi′)−`(h,zi))

∣∣

∣∣


]

.

(6.5)
Free download pdf