28 Proof of the Fundamental Theorem of Learning Theory
In this chapter we prove Theorem 6.8 from Chapter 6. We remind the reader
the conditions of the theorem, which will hold throughout this chapter:His a
hypothesis class of functions from a domainXto{ 0 , 1 }, the loss function is the
0 −1 loss, and VCdim(H) =d <∞.
We shall prove the upper bound for both the realizable and agnostic cases
and shall prove the lower bound for the agnostic case. The lower bound for the
realizable case is left as an exercise.
28.1 The Upper Bound for the Agnostic Case
For the upper bound we need to prove that there existsCsuch thatHis agnostic
PAC learnable with sample complexity
mH(,δ)≤Cd+ ln(1/δ)
^2
.
We will prove the slightly looser bound:
mH(,δ)≤C
dlog(d/) + ln(1/δ)
^2. (28.1)
The tighter bound in the theorem statement requires a more involved proof, in
which a more careful analysis of the Rademacher complexity using a technique
called “chaining” should be used. This is beyond the scope of this book.
To prove Equation (28.1), it suffices to show that applying the ERM with a
sample size
m≥ 4
32 d
^2
·log
(
64 d
^2
)
+
8
^2
·(8dlog(e/d) + 2 log(4/δ))
yields an,δ-learner forH. We prove this result on the basis of Theorem 26.5.
Let (x 1 ,y 1 ),...,(xm,ym) be a classification training set. Recall that the Sauer-
Shelah lemma tells us that if VCdim(H) =dthen
|{(h(x 1 ),...,h(xm)) :h∈H}| ≤
(em
d
)d
.
DenoteA={( (^1) [h(x 1 ) 6 =y 1 ],..., (^1) [h(xm) 6 =ym]) :h∈H}. This clearly implies that
|A| ≤
(em
d
)d
.
Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning