72 The VC-Dimension
6.3.4 Finite Classes
LetHbe a finite class. Then, clearly, for any setCwe have|HC|≤|H|and thusC
cannot be shattered if|H|< 2 |C|. This implies that VCdim(H)≤log 2 (|H|). This
shows that the PAC learnability of finite classes follows from the more general
statement of PAC learnability of classes with finite VC-dimension, which we shall
see in the next section. Note, however, that the VC-dimension of a finite class
Hcan be significantly smaller than log 2 (|H|). For example, letX={ 1 ,...,k},
for some integerk, and consider the class of threshold functions (as defined in
Example 6.2). Then,|H|=kbut VCdim(H) = 1. Sincekcan be arbitrarily
large, the gap between log 2 (|H|) and VCdim(H) can be arbitrarily large.
6.3.5 VC-Dimension and the Number of Parameters
In the previous examples, the VC-dimension happened to equal the number of
parameters defining the hypothesis class. While this is often the case, it is not
always true. Consider, for example, the domainX=R, and the hypothesis class
H={hθ:θ∈R}wherehθ:X → { 0 , 1 }is defined byhθ(x) =d 0 .5 sin(θx)e. It
is possible to prove that VCdim(H) =∞, namely, foreveryd, one can findd
points that are shattered byH(see Exercise 8).
6.4 The Fundamental Theorem of PAC learning
We have already shown that a class of infinite VC-dimension is not learnable. The
converse statement is also true, leading to the fundamental theorem of statistical
learning theory:
theorem6.7 (The Fundamental Theorem of Statistical Learning) LetHbe a
hypothesis class of functions from a domainXto{ 0 , 1 }and let the loss function
be the 0 − 1 loss. Then, the following are equivalent:
1.Hhas the uniform convergence property.
- Any ERM rule is a successful agnostic PAC learner forH.
3.His agnostic PAC learnable.
4.His PAC learnable. - Any ERM rule is a successful PAC learner forH.
6.Hhas a finite VC-dimension.
The proof of the theorem is given in the next section.
Not only does the VC-dimension characterize PAC learnability; it even deter-
mines the sample complexity.
theorem6.8 (The Fundamental Theorem of Statistical Learning – Quantita-
tive Version) LetHbe a hypothesis class of functions from a domainXto{ 0 , 1 }
and let the loss function be the 0 − 1 loss. Assume thatVCdim(H) =d <∞.
Then, there are absolute constantsC 1 ,C 2 such that: