398 Proof of the Fundamental Theorem of Learning Theory
As long asm < 8 dρ 2 , this term would be larger thanρ/4.
In summary, we have shown that ifm < 8 dρ 2 then for any algorithm there
exists a distribution such thatS∼DEm[
LD(A(S))−hmin∈HLD(h)]
≥ρ/ 4.Finally, Let ∆ =^1 ρ(LD(A(S))−minh∈HLD(h)) and note that ∆∈[0,1] (see
Equation (28.5)). Therefore, using Lemma B.1, we get thatP[LD(A(S))−min
h∈HLD(h)> ] =P[
∆>
ρ]
≥E[∆]−
ρ≥1
4
−
ρ
Choosingρ= 8we conclude that ifm < 512 d 2 , then with probability of at least
1 /8 we will haveLD(A(S))−minh∈HLD(h)≥.28.3 The Upper Bound for the Realizable Case
Here we prove that there existsCsuch thatHis PAC learnable with sample
complexitymH(,δ)≤Cdln(1/) + ln(1/δ)
.
We do so by showing that form≥Cdln(1/)+ln(1 /δ),His learnable using the
ERM rule. We prove this claim based on the notion of-nets.definition28.2 (-net) LetXbe a domain.S⊂ Xis an-net forH ⊂ 2 X
with respect to a distributionDoverXif∀h∈H: D(h)≥ ⇒ h∩S 6 =∅.theorem28.3 LetH ⊂ 2 XwithVCdim(H) =d. Fix∈(0,1),δ∈(0, 1 /4)
and letm≥8
(
2 dlog(
16 e
)
+ log(
2
δ))
.
Then, with probability of at least 1 −δover a choice ofS∼Dmwe have thatS
is an-net forH.Proof LetB={S⊂X :|S|=m,∃h∈H,D(h)≥,h∩S=∅}be the set of sets which are not-nets. We need to boundP[S∈B]. DefineB′={(S,T)⊂X :|S|=|T|=m,∃h∈H,D(h)≥,h∩S=∅,|T∩h|>m 2 }.