Understanding Machine Learning: From Theory to Algorithms

398 Proof of the Fundamental Theorem of Learning Theory

As long asm < 8 dρ 2 , this term would be larger thanρ/4. In summary, we have shown that ifm < 8 dρ 2 then for any algorithm there exists a distribution such that

S∼DEm

[

LD(A(S))−hmin∈HLD(h)

]

≥ρ/ 4.

Finally, Let ∆ =^1 ρ(LD(A(S))−minh∈HLD(h)) and note that ∆∈[0,1] (see Equation (28.5)). Therefore, using Lemma B.1, we get that

P[LD(A(S))−min h∈H

LD(h)> ] =P

[

∆>

ρ

]

≥E[∆]−

ρ

≥

1

4

−

ρ Choosingρ= 8we conclude that ifm < 512 d 2 , then with probability of at least 1 /8 we will haveLD(A(S))−minh∈HLD(h)≥.

28.3 The Upper Bound for the Realizable Case

Here we prove that there existsCsuch thatHis PAC learnable with sample complexity

mH(,δ)≤C

dln(1/) + ln(1/δ) . We do so by showing that form≥Cdln(1/)+ln(1 /δ),His learnable using the ERM rule. We prove this claim based on the notion of-nets.

definition28.2 (-net) LetXbe a domain.S⊂ Xis an-net forH ⊂ 2 X with respect to a distributionDoverXif

∀h∈H: D(h)≥ ⇒ h∩S 6 =∅.

theorem28.3 LetH ⊂ 2 XwithVCdim(H) =d. Fix∈(0,1),δ∈(0, 1 /4) and let

m≥

8

(

2 dlog

(

16 e

)

+ log

(

2

δ

))

.

Then, with probability of at least 1 −δover a choice ofS∼Dmwe have thatS is an-net forH.

Proof Let

B={S⊂X :|S|=m,∃h∈H,D(h)≥,h∩S=∅}

be the set of sets which are not-nets. We need to boundP[S∈B]. Define

B′={(S,T)⊂X :|S|=|T|=m,∃h∈H,D(h)≥,h∩S=∅,|T∩h|>m 2 }.

Understanding Machine Learning: From Theory to Algorithms

[

]

[

∆>

]

≥E[∆]−

1

4

−

28.3 The Upper Bound for the Realizable Case

8

(

(

)

(

2

))

.

Get our desktop app

Company

Features

Documentation

Resources

Understanding Machine Learning: From Theory to Algorithms

[

]

[

∆>



]

≥E[∆]−



1

4

−



28.3 The Upper Bound for the Realizable Case

8



(

(

)

(

2

))

.

Get our desktop app

Company

Features

Documentation

Resources