Understanding Machine Learning: From Theory to Algorithms

76 The VC-Dimension

To ensure that the preceding is at mostwe need that

m≥

2 dlog(m) (δ)^2

+

2 dlog(2e/d) (δ)^2

.

Standard algebraic manipulations (see Lemma A.2 in Appendix A) show that a sufficient condition for the preceding to hold is that

m≥ 4

2 d (δ)^2

log

(

2 d (δ)^2

)

+

4 dlog(2e/d) (δ)^2

.

Remark 6.4 The upper bound onmUCH we derived in the proof Theorem 6.7 is not the tightest possible. A tighter analysis that yields the bounds given in Theorem 6.8 can be found in Chapter 28.

Proof of Theorem 6.11 *

We will start by showing that

E S∼Dm

[

sup h∈H

|LD(h)−LS(h)|

]

≤

4 +

√

log(τH(2m)) √ 2 m

. (6.4)

Since the random variable suph∈H|LD(h)−LS(h)|is nonnegative, the proof of the theorem follows directly from the preceding using Markov’s inequality (see Section B.1). To bound the left-hand side of Equation (6.4) we first note that for every h∈ H, we can rewriteLD(h) =ES′∼Dm[LS′(h)], whereS′=z 1 ′,...,zm′ is an additional i.i.d. sample. Therefore,

S∼DEm

[

sup h∈H

|LD(h)−LS(h)|

]

=S∼DEm

[

sup h∈H

∣∣

∣S′∼DEmLS′(h)−LS(h)

∣∣

∣

]

.

A generalization of the triangle inequality yields ∣∣ ∣S′∼DEm[LS′(h)−LS(h)]

∣∣

∣≤S′∼DEm|LS′(h)−LS(h)|,

and the fact that supermum of expectation is smaller than expectation of supre- mum yields

sup h∈H S′∼DEm|LS′(h)−LS(h)|≤S′∼DEmsup h∈H

|LS′(h)−LS(h)|.

Formally, the previous two inequalities follow from Jensen’s inequality. Combin- ing all we obtain

S∼DEm

[

sup h∈H

|LD(h)−LS(h)|

]

≤S,S′E∼Dm

[

sup h∈H

|LS′(h)−LS(h)|

]

=S,S′E∼Dm

[

sup h∈H

1

m

∣∣

∣

∑m

i=1

(`(h,zi′)−`(h,zi))

∣∣

∣

]

.

(6.5)

Understanding Machine Learning: From Theory to Algorithms

+

.

(

)

+

.

Proof of Theorem 6.11 *

[

]

≤

4 +

√

. (6.4)

[

]

[

∣∣

∣∣

∣

]

.

∣∣

[

]

[

]

[

1

∣∣

∣∣

∣

∣∣

∣∣

∣

]

.

(6.5)

Get our desktop app

Company

Features

Documentation

Resources