400 Proof of the Fundamental Theorem of Learning Theory
EA∼D 2 m[f(A,Aj)]. Since this holds for anyjit also holds for the expectation of
j∑chosen at random fromJ. In particular, it holds for the functionf(A,A 0 ) =
h∈HA^1 [|h∩A^0 |=0]^1 [|h∩A|≥α]. We therefore obtain thatP[A∈B′]≤ E
A∼D^2 mE
j∼J∑
h∈HA(^1) [|h∩Aj|=0] (^1) [|h∩A|≥α]
=A∼DE 2 m
∑
h∈HA(^1) [|h∩A|≥α]j∼EJ (^1) [|h∩Aj|=0].
Now, fix someAs.t.|h∩A| ≥α. Then,Ej (^1) [|h∩Aj|=0]is the probability that
when choosingmballs from a bag with at leastαred balls, we will never choose
a red ball. This probability is at most
(1−α/(2m))m= (1−/4)m≤e−m/^4.
We therefore get that
P[A∈B′]≤A∼DE 2 m
∑
h∈HAe−m/^4 ≤e−m/^4 A∼DE 2 m|HA|.Using the definition of the growth function weconclude the proof of Claim 2.
Completing the Proof:By Sauer’s lemma we know thatτH(2m)≤(2em/d)d.
Combining this with the two claims we obtain thatP[S∈B]≤2(2em/d)de−m/^4.We would like the right-hand side of the inequality to be at mostδ; that is,2(2em/d)de−m/^4 ≤δ.Rearranging, we obtain the requirementm≥4
(dlog(2em/d) + log(2/δ)) =4 d
log(m) +4
(dlog(2e/d) + log(2/δ).Using Lemma A.2, a sufficient condition for the preceding to hold is thatm≥16 d
log(
8 d
)
+
8
(dlog(2e/d) + log(2/δ).A sufficient condition for this is thatm≥16 d
log(
8 d
)
+
16
(dlog(2e/d) +1
2 log(2/δ)=16 d
(
log(
8 d 2 e
d))
+
8
log(2/δ)=^8
(
2 dlog(
16 e
)
+ log(
2
δ))
.
and this concludes our proof.