Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

400 Proof of the Fundamental Theorem of Learning Theory


EA∼D 2 m[f(A,Aj)]. Since this holds for anyjit also holds for the expectation of
j∑chosen at random fromJ. In particular, it holds for the functionf(A,A 0 ) =
h∈HA^1 [|h∩A^0 |=0]^1 [|h∩A|≥α]. We therefore obtain that

P[A∈B′]≤ E
A∼D^2 m

E

j∼J


h∈HA

(^1) [|h∩Aj|=0] (^1) [|h∩A|≥α]
=A∼DE 2 m



h∈HA

(^1) [|h∩A|≥α]j∼EJ (^1) [|h∩Aj|=0].
Now, fix someAs.t.|h∩A| ≥α. Then,Ej (^1) [|h∩Aj|=0]is the probability that
when choosingmballs from a bag with at leastαred balls, we will never choose
a red ball. This probability is at most
(1−α/(2m))m= (1−/4)m≤e−m/^4.
We therefore get that
P[A∈B′]≤A∼DE 2 m



h∈HA

e−m/^4 ≤e−m/^4 A∼DE 2 m|HA|.

Using the definition of the growth function weconclude the proof of Claim 2.
Completing the Proof:By Sauer’s lemma we know thatτH(2m)≤(2em/d)d.
Combining this with the two claims we obtain that

P[S∈B]≤2(2em/d)de−m/^4.

We would like the right-hand side of the inequality to be at mostδ; that is,

2(2em/d)de−m/^4 ≤δ.

Rearranging, we obtain the requirement

m≥

4



(dlog(2em/d) + log(2/δ)) =

4 d


log(m) +

4



(dlog(2e/d) + log(2/δ).

Using Lemma A.2, a sufficient condition for the preceding to hold is that

m≥

16 d


log

(

8 d


)

+

8



(dlog(2e/d) + log(2/δ).

A sufficient condition for this is that

m≥

16 d
 log

(

8 d


)

+

16

(dlog(2e/d) +

1
2 log(2/δ)

=

16 d


(

log

(

8 d 2 e
d

))

+

8

log(2/δ)

=^8


(

2 dlog

(

16 e


)

+ log

(

2

δ

))

.

and this concludes our proof.
Free download pdf