Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
6.5 Proof of Theorem 6.7 75

Next, defineH′⊆Hto be
H′={h∈H:∃h′∈Hs.t. (1−h′(c 1 ),h′(c 2 ),...,h′(cm))
= (h(c 1 ),h(c 2 ),...,h(cm)},

namely,H′contains pairs of hypotheses that agree onC′and differ onc 1. Using
this definition, it is clear that ifH′shatters a setB⊆C′then it also shatters
the setB∪{c 1 }and vice versa. Combining this with the fact thatY 1 =H′C′and
using the inductive assumption (now applied onH′andC′) we obtain that
|Y 1 |=|H′C′|≤|{B⊆C′:H′shattersB}|=|{B⊆C′:H′shattersB∪{c 1 }}|
=|{B⊆C:c 1 ∈B∧H′shattersB}|≤|{B⊆C:c 1 ∈B∧HshattersB}|.
Overall, we have shown that
|HC|=|Y 0 |+|Y 1 |
≤ |{B⊆C:c 1 6∈B∧HshattersB}|+|{B⊆C:c 1 ∈B∧HshattersB}|
=|{B⊆C:HshattersB}|,
which concludes our proof.

6.5.2 Uniform Convergence for Classes of Small Effective Size


In this section we prove that ifHhas small effective size then it enjoys the
uniform convergence property. Formally,

theorem6.11 LetHbe a class and letτHbe its growth function. Then, for
everyDand everyδ∈(0,1), with probability of at least 1 −δover the choice of
S∼Dmwe have

|LD(h)−LS(h)|≤

4 +


log(τH(2m))
δ


2 m

.

Before proving the theorem, let us first conclude the proof of Theorem 6.7.
Proof of Theorem 6.7 It suffices to prove that if the VC-dimension is finite then
the uniform convergence property holds. We will prove that

mUCH(,δ)≤ 4

16 d
(δ)^2

log

(

16 d
(δ)^2

)

+

16 dlog(2e/d)
(δ)^2

.

From Sauer’s lemma we have that form > d,τH(2m)≤(2em/d)d. Combining
this with Theorem 6.11 we obtain that with probability of at least 1−δ,

|LS(h)−LD(h)|≤4 +


dlog(2em/d)
δ


2 m

.

For simplicity assume that


dlog(2em/d)≥4; hence,

|LS(h)−LD(h)|≤^1
δ


2 dlog(2em/d)
m

.
Free download pdf