398 Proof of the Fundamental Theorem of Learning Theory
As long asm < 8 dρ 2 , this term would be larger thanρ/4.
In summary, we have shown that ifm < 8 dρ 2 then for any algorithm there
exists a distribution such that
S∼DEm
[
LD(A(S))−hmin∈HLD(h)
]
≥ρ/ 4.
Finally, Let ∆ =^1 ρ(LD(A(S))−minh∈HLD(h)) and note that ∆∈[0,1] (see
Equation (28.5)). Therefore, using Lemma B.1, we get that
P[LD(A(S))−min
h∈H
LD(h)> ] =P
[
∆>
ρ
]
≥E[∆]−
ρ
≥
1
4
−
ρ
Choosingρ= 8we conclude that ifm < 512 d 2 , then with probability of at least
1 /8 we will haveLD(A(S))−minh∈HLD(h)≥.
28.3 The Upper Bound for the Realizable Case
Here we prove that there existsCsuch thatHis PAC learnable with sample
complexity
mH(,δ)≤C
dln(1/) + ln(1/δ)
.
We do so by showing that form≥Cdln(1/)+ln(1 /δ),His learnable using the
ERM rule. We prove this claim based on the notion of-nets.
definition28.2 (-net) LetXbe a domain.S⊂ Xis an-net forH ⊂ 2 X
with respect to a distributionDoverXif
∀h∈H: D(h)≥ ⇒ h∩S 6 =∅.
theorem28.3 LetH ⊂ 2 XwithVCdim(H) =d. Fix∈(0,1),δ∈(0, 1 /4)
and let
m≥
8
(
2 dlog
(
16 e
)
+ log
(
2
δ
))
.
Then, with probability of at least 1 −δover a choice ofS∼Dmwe have thatS
is an-net forH.
Proof Let
B={S⊂X :|S|=m,∃h∈H,D(h)≥,h∩S=∅}
be the set of sets which are not-nets. We need to boundP[S∈B]. Define
B′={(S,T)⊂X :|S|=|T|=m,∃h∈H,D(h)≥,h∩S=∅,|T∩h|>m 2 }.