Understanding Machine Learning: From Theory to Algorithms

400 Proof of the Fundamental Theorem of Learning Theory

EA∼D 2 m[f(A,Aj)]. Since this holds for anyjit also holds for the expectation of j∑chosen at random fromJ. In particular, it holds for the functionf(A,A 0 ) = h∈HA^1 [|h∩A^0 |=0]^1 [|h∩A|≥α]. We therefore obtain that

P[A∈B′]≤ E A∼D^2 m

E

j∼J

∑

h∈HA

(^1) [|h∩Aj|=0] (^1) [|h∩A|≥α]
=A∼DE 2 m

∑

h∈HA

(^1) [|h∩A|≥α]j∼EJ (^1) [|h∩Aj|=0].
Now, fix someAs.t.|h∩A| ≥α. Then,Ej (^1) [|h∩Aj|=0]is the probability that
when choosingmballs from a bag with at leastαred balls, we will never choose
a red ball. This probability is at most
(1−α/(2m))m= (1−/4)m≤e−m/^4.
We therefore get that
P[A∈B′]≤A∼DE 2 m

∑

h∈HA

e−m/^4 ≤e−m/^4 A∼DE 2 m|HA|.

Using the definition of the growth function weconclude the proof of Claim 2. Completing the Proof:By Sauer’s lemma we know thatτH(2m)≤(2em/d)d. Combining this with the two claims we obtain that

P[S∈B]≤2(2em/d)de−m/^4.

We would like the right-hand side of the inequality to be at mostδ; that is,

2(2em/d)de−m/^4 ≤δ.

Rearranging, we obtain the requirement

m≥

4

(dlog(2em/d) + log(2/δ)) =

4 d

log(m) +

4

(dlog(2e/d) + log(2/δ).

Using Lemma A.2, a sufficient condition for the preceding to hold is that

m≥

16 d

log

(

8 d

)

+

8

(dlog(2e/d) + log(2/δ).

A sufficient condition for this is that

m≥

16 d log

(

8 d

)

+

16

(dlog(2e/d) +

1 2 log(2/δ)

=

16 d

(

log

(

8 d 2 e d

))

+

8

log(2/δ)

=^8

(

2 dlog

(

16 e

)

+ log

(

2

δ

))

.

and this concludes our proof.

Understanding Machine Learning: From Theory to Algorithms

E

∑

∑

∑

4

4

(

)

+

8

(

)

+

16

(

(

))

+

8

(

(

)

(

2

))

.

Get our desktop app

Company

Features

Documentation

Resources

Understanding Machine Learning: From Theory to Algorithms

E

∑

∑

∑

4



4



(

)

+

8



(

)

+

16

(

(

))

+

8

(

(

)

(

2

))

.

Get our desktop app

Company

Features

Documentation

Resources