Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

72 The VC-Dimension


6.3.4 Finite Classes


LetHbe a finite class. Then, clearly, for any setCwe have|HC|≤|H|and thusC
cannot be shattered if|H|< 2 |C|. This implies that VCdim(H)≤log 2 (|H|). This
shows that the PAC learnability of finite classes follows from the more general
statement of PAC learnability of classes with finite VC-dimension, which we shall
see in the next section. Note, however, that the VC-dimension of a finite class
Hcan be significantly smaller than log 2 (|H|). For example, letX={ 1 ,...,k},
for some integerk, and consider the class of threshold functions (as defined in
Example 6.2). Then,|H|=kbut VCdim(H) = 1. Sincekcan be arbitrarily
large, the gap between log 2 (|H|) and VCdim(H) can be arbitrarily large.

6.3.5 VC-Dimension and the Number of Parameters


In the previous examples, the VC-dimension happened to equal the number of
parameters defining the hypothesis class. While this is often the case, it is not
always true. Consider, for example, the domainX=R, and the hypothesis class
H={hθ:θ∈R}wherehθ:X → { 0 , 1 }is defined byhθ(x) =d 0 .5 sin(θx)e. It
is possible to prove that VCdim(H) =∞, namely, foreveryd, one can findd
points that are shattered byH(see Exercise 8).

6.4 The Fundamental Theorem of PAC learning


We have already shown that a class of infinite VC-dimension is not learnable. The
converse statement is also true, leading to the fundamental theorem of statistical
learning theory:
theorem6.7 (The Fundamental Theorem of Statistical Learning) LetHbe a
hypothesis class of functions from a domainXto{ 0 , 1 }and let the loss function
be the 0 − 1 loss. Then, the following are equivalent:
1.Hhas the uniform convergence property.


  1. Any ERM rule is a successful agnostic PAC learner forH.
    3.His agnostic PAC learnable.
    4.His PAC learnable.

  2. Any ERM rule is a successful PAC learner forH.
    6.Hhas a finite VC-dimension.
    The proof of the theorem is given in the next section.
    Not only does the VC-dimension characterize PAC learnability; it even deter-
    mines the sample complexity.
    theorem6.8 (The Fundamental Theorem of Statistical Learning – Quantita-
    tive Version) LetHbe a hypothesis class of functions from a domainXto{ 0 , 1 }
    and let the loss function be the 0 − 1 loss. Assume thatVCdim(H) =d <∞.
    Then, there are absolute constantsC 1 ,C 2 such that:

Free download pdf