Understanding Machine Learning: From Theory to Algorithms

72 The VC-Dimension

6.3.4 Finite Classes

LetHbe a finite class. Then, clearly, for any setCwe have|HC|≤|H|and thusC cannot be shattered if|H|< 2 |C|. This implies that VCdim(H)≤log 2 (|H|). This shows that the PAC learnability of finite classes follows from the more general statement of PAC learnability of classes with finite VC-dimension, which we shall see in the next section. Note, however, that the VC-dimension of a finite class Hcan be significantly smaller than log 2 (|H|). For example, letX={ 1 ,...,k}, for some integerk, and consider the class of threshold functions (as defined in Example 6.2). Then,|H|=kbut VCdim(H) = 1. Sincekcan be arbitrarily large, the gap between log 2 (|H|) and VCdim(H) can be arbitrarily large.

6.3.5 VC-Dimension and the Number of Parameters

In the previous examples, the VC-dimension happened to equal the number of parameters defining the hypothesis class. While this is often the case, it is not always true. Consider, for example, the domainX=R, and the hypothesis class H={hθ:θ∈R}wherehθ:X → { 0 , 1 }is defined byhθ(x) =d 0 .5 sin(θx)e. It is possible to prove that VCdim(H) =∞, namely, foreveryd, one can findd points that are shattered byH(see Exercise 8).

6.4 The Fundamental Theorem of PAC learning

We have already shown that a class of infinite VC-dimension is not learnable. The converse statement is also true, leading to the fundamental theorem of statistical learning theory: theorem6.7 (The Fundamental Theorem of Statistical Learning) LetHbe a hypothesis class of functions from a domainXto{ 0 , 1 }and let the loss function be the 0 − 1 loss. Then, the following are equivalent: 1.Hhas the uniform convergence property.

Any ERM rule is a successful agnostic PAC learner forH.
3.His agnostic PAC learnable.
4.His PAC learnable.

Any ERM rule is a successful PAC learner forH.
6.Hhas a finite VC-dimension.
The proof of the theorem is given in the next section.
Not only does the VC-dimension characterize PAC learnability; it even deter-
mines the sample complexity.
theorem6.8 (The Fundamental Theorem of Statistical Learning – Quantita-
tive Version) LetHbe a hypothesis class of functions from a domainXto{ 0 , 1 }
and let the loss function be the 0 − 1 loss. Assume thatVCdim(H) =d <∞.
Then, there are absolute constantsC 1 ,C 2 such that:

Understanding Machine Learning: From Theory to Algorithms

6.3.4 Finite Classes

6.3.5 VC-Dimension and the Number of Parameters

6.4 The Fundamental Theorem of PAC learning

Get our desktop app

Company

Features

Documentation

Resources