Understanding Machine Learning: From Theory to Algorithms

6.5 Proof of Theorem 6.7 73

1.Hhas the uniform convergence property with sample complexity

C 1 d+ log(1/δ) ^2

≤mUCH(,δ)≤C 2 d+ log(1/δ) ^2 2.His agnostic PAC learnable with sample complexity

C 1

d+ log(1/δ) ^2

≤ mH(,δ)≤C 2

d+ log(1/δ) ^2 3.His PAC learnable with sample complexity

C 1 d+ log(1/δ)

≤mH(,δ)≤C 2 dlog(1/) + log(1/δ) The proof of this theorem is given in Chapter 28. Remark 6.3 We stated the fundamental theorem for binary classification tasks. A similar result holds for some other learning problems such as regression with the absolute loss or the squared loss. However, the theorem does not hold for all learning tasks. In particular, learnability is sometimes possible even though the uniform convergence property does not hold (we will see an example in Chapter 13, Exercise 2). Furthermore, in some situations, the ERM rule fails but learnability is possible with other learning rules.

6.5 Proof of Theorem 6.7

We have already seen that 1→2 in Chapter 4. The implications 2→3 and 3 →4 are trivial and so is 2→5. The implications 4→6 and 5→6 follow from the No-Free-Lunch theorem. The difficult part is to show that 6→1. The proof is based on two main claims:

If VCdim(H) =d, then even thoughHmight be infinite, when restricting it
to a finite setC⊂ X, its “effective” size,|HC|, is onlyO(|C|d). That is,
the size ofHCgrows polynomially rather than exponentially with|C|. This
claim is often referred to asSauer’s lemma, but it has also been stated and
proved independently by Shelah and by Perles. The formal statement is
given in Section 6.5.1 later.

In Section 4 we have shown that finite hypothesis classes enjoy the uniform
convergence property. In Section 6.5.2 later we generalize this result and
show that uniform convergence holds whenever the hypothesis class has a
“small effective size.” By “small effective size” we mean classes for which
|HC|grows polynomially with|C|.

6.5.1 Sauer’s Lemma and the Growth Function

We defined the notion ofshattering, by considering the restriction ofHto a finite set of instances. The growth function measures the maximal “effective” size of Hon a set ofmexamples. Formally:

Understanding Machine Learning: From Theory to Algorithms

6.5 Proof of Theorem 6.7

6.5.1 Sauer’s Lemma and the Growth Function

Get our desktop app

Company

Features

Documentation

Resources