76 The VC-Dimension
To ensure that the preceding is at mostwe need that
m≥
2 dlog(m)
(δ)^2
+
2 dlog(2e/d)
(δ)^2
.
Standard algebraic manipulations (see Lemma A.2 in Appendix A) show that a
sufficient condition for the preceding to hold is that
m≥ 4
2 d
(δ)^2
log
(
2 d
(δ)^2
)
+
4 dlog(2e/d)
(δ)^2
.
Remark 6.4 The upper bound onmUCH we derived in the proof Theorem 6.7
is not the tightest possible. A tighter analysis that yields the bounds given in
Theorem 6.8 can be found in Chapter 28.
Proof of Theorem 6.11 *
We will start by showing that
E
S∼Dm
[
sup
h∈H
|LD(h)−LS(h)|
]
≤
4 +
√
log(τH(2m))
√
2 m
. (6.4)
Since the random variable suph∈H|LD(h)−LS(h)|is nonnegative, the proof of
the theorem follows directly from the preceding using Markov’s inequality (see
Section B.1).
To bound the left-hand side of Equation (6.4) we first note that for every
h∈ H, we can rewriteLD(h) =ES′∼Dm[LS′(h)], whereS′=z 1 ′,...,zm′ is an
additional i.i.d. sample. Therefore,
S∼DEm
[
sup
h∈H
|LD(h)−LS(h)|
]
=S∼DEm
[
sup
h∈H
∣∣
∣S′∼DEmLS′(h)−LS(h)
∣∣
∣
]
.
A generalization of the triangle inequality yields
∣∣
∣S′∼DEm[LS′(h)−LS(h)]
∣∣
∣≤S′∼DEm|LS′(h)−LS(h)|,
and the fact that supermum of expectation is smaller than expectation of supre-
mum yields
sup
h∈H
S′∼DEm|LS′(h)−LS(h)|≤S′∼DEmsup
h∈H
|LS′(h)−LS(h)|.
Formally, the previous two inequalities follow from Jensen’s inequality. Combin-
ing all we obtain
S∼DEm
[
sup
h∈H
|LD(h)−LS(h)|
]
≤S,S′E∼Dm
[
sup
h∈H
|LS′(h)−LS(h)|
]
=S,S′E∼Dm
[
sup
h∈H
1
m
∣∣
∣∣
∣
∑m
i=1
(`(h,zi′)−`(h,zi))