Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

398 Proof of the Fundamental Theorem of Learning Theory


As long asm < 8 dρ 2 , this term would be larger thanρ/4.
In summary, we have shown that ifm < 8 dρ 2 then for any algorithm there
exists a distribution such that

S∼DEm

[

LD(A(S))−hmin∈HLD(h)

]

≥ρ/ 4.

Finally, Let ∆ =^1 ρ(LD(A(S))−minh∈HLD(h)) and note that ∆∈[0,1] (see
Equation (28.5)). Therefore, using Lemma B.1, we get that

P[LD(A(S))−min
h∈H

LD(h)> ] =P

[

∆>



ρ

]

≥E[∆]−



ρ


1

4




ρ
Choosingρ= 8we conclude that ifm < 512 d 2 , then with probability of at least
1 /8 we will haveLD(A(S))−minh∈HLD(h)≥.

28.3 The Upper Bound for the Realizable Case


Here we prove that there existsCsuch thatHis PAC learnable with sample
complexity

mH(,δ)≤C

dln(1/) + ln(1/δ)
.
We do so by showing that form≥Cdln(1/)+ln(1 /δ),His learnable using the
ERM rule. We prove this claim based on the notion of-nets.

definition28.2 (-net) LetXbe a domain.S⊂ Xis an-net forH ⊂ 2 X
with respect to a distributionDoverXif

∀h∈H: D(h)≥ ⇒ h∩S 6 =∅.

theorem28.3 LetH ⊂ 2 XwithVCdim(H) =d. Fix∈(0,1),δ∈(0, 1 /4)
and let

m≥

8



(

2 dlog

(

16 e


)

+ log

(

2

δ

))

.

Then, with probability of at least 1 −δover a choice ofS∼Dmwe have thatS
is an-net forH.

Proof Let

B={S⊂X :|S|=m,∃h∈H,D(h)≥,h∩S=∅}

be the set of sets which are not-nets. We need to boundP[S∈B]. Define

B′={(S,T)⊂X :|S|=|T|=m,∃h∈H,D(h)≥,h∩S=∅,|T∩h|>m 2 }.
Free download pdf