Understanding Machine Learning: From Theory to Algorithms

6.5 Proof of Theorem 6.7 75

Next, defineH′⊆Hto be H′={h∈H:∃h′∈Hs.t. (1−h′(c 1 ),h′(c 2 ),...,h′(cm)) = (h(c 1 ),h(c 2 ),...,h(cm)},

namely,H′contains pairs of hypotheses that agree onC′and differ onc 1. Using this definition, it is clear that ifH′shatters a setB⊆C′then it also shatters the setB∪{c 1 }and vice versa. Combining this with the fact thatY 1 =H′C′and using the inductive assumption (now applied onH′andC′) we obtain that |Y 1 |=|H′C′|≤|{B⊆C′:H′shattersB}|=|{B⊆C′:H′shattersB∪{c 1 }}| =|{B⊆C:c 1 ∈B∧H′shattersB}|≤|{B⊆C:c 1 ∈B∧HshattersB}|. Overall, we have shown that |HC|=|Y 0 |+|Y 1 | ≤ |{B⊆C:c 1 6∈B∧HshattersB}|+|{B⊆C:c 1 ∈B∧HshattersB}| =|{B⊆C:HshattersB}|, which concludes our proof.

6.5.2 Uniform Convergence for Classes of Small Effective Size

In this section we prove that ifHhas small effective size then it enjoys the uniform convergence property. Formally,

theorem6.11 LetHbe a class and letτHbe its growth function. Then, for everyDand everyδ∈(0,1), with probability of at least 1 −δover the choice of S∼Dmwe have

|LD(h)−LS(h)|≤

4 +

√

log(τH(2m)) δ

√

2 m

.

Before proving the theorem, let us first conclude the proof of Theorem 6.7. Proof of Theorem 6.7 It suffices to prove that if the VC-dimension is finite then the uniform convergence property holds. We will prove that

mUCH(,δ)≤ 4

16 d (δ)^2

log

(

16 d (δ)^2

)

+

16 dlog(2e/d) (δ)^2

.

From Sauer’s lemma we have that form > d,τH(2m)≤(2em/d)d. Combining this with Theorem 6.11 we obtain that with probability of at least 1−δ,

|LS(h)−LD(h)|≤4 +

√

dlog(2em/d) δ

√

2 m

.

For simplicity assume that

√

dlog(2em/d)≥4; hence,

|LS(h)−LD(h)|≤^1 δ

√

2 dlog(2em/d) m

.

Understanding Machine Learning: From Theory to Algorithms

6.5.2 Uniform Convergence for Classes of Small Effective Size

4 +

√

√

.

(

)

+

.

√

√

.

√

√

.

Get our desktop app

Company

Features

Documentation

Resources