400 Proof of the Fundamental Theorem of Learning Theory
EA∼D 2 m[f(A,Aj)]. Since this holds for anyjit also holds for the expectation of
j∑chosen at random fromJ. In particular, it holds for the functionf(A,A 0 ) =
h∈HA^1 [|h∩A^0 |=0]^1 [|h∩A|≥α]. We therefore obtain that
P[A∈B′]≤ E
A∼D^2 m
E
j∼J
∑
h∈HA
(^1) [|h∩Aj|=0] (^1) [|h∩A|≥α]
=A∼DE 2 m
∑
h∈HA
(^1) [|h∩A|≥α]j∼EJ (^1) [|h∩Aj|=0].
Now, fix someAs.t.|h∩A| ≥α. Then,Ej (^1) [|h∩Aj|=0]is the probability that
when choosingmballs from a bag with at leastαred balls, we will never choose
a red ball. This probability is at most
(1−α/(2m))m= (1−/4)m≤e−m/^4.
We therefore get that
P[A∈B′]≤A∼DE 2 m
∑
h∈HA
e−m/^4 ≤e−m/^4 A∼DE 2 m|HA|.
Using the definition of the growth function weconclude the proof of Claim 2.
Completing the Proof:By Sauer’s lemma we know thatτH(2m)≤(2em/d)d.
Combining this with the two claims we obtain that
P[S∈B]≤2(2em/d)de−m/^4.
We would like the right-hand side of the inequality to be at mostδ; that is,
2(2em/d)de−m/^4 ≤δ.
Rearranging, we obtain the requirement
m≥
4
(dlog(2em/d) + log(2/δ)) =
4 d
log(m) +
4
(dlog(2e/d) + log(2/δ).
Using Lemma A.2, a sufficient condition for the preceding to hold is that
m≥
16 d
log
(
8 d
)
+
8
(dlog(2e/d) + log(2/δ).
A sufficient condition for this is that
m≥
16 d
log
(
8 d
)
+
16
(dlog(2e/d) +
1
2 log(2/δ)
=
16 d
(
log
(
8 d 2 e
d
))
+
8
log(2/δ)
=^8
(
2 dlog
(
16 e
)
+ log
(
2
δ
))
.
and this concludes our proof.