Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
29.3 Calculating the Natarajan Dimension 405

By Sauer’s lemma,|(Hbin)C|≤|C|d. We conclude that

2 |C| ≤

∣∣


(

HOvAbin,k

)

C

∣∣

∣ ≤ |C|dk.

The proof follows by taking the logarithm and applying Lemma A.1.
How tight is Lemma 29.5? It is not hard to see that for some classes, Ndim(HOvAbin,k)
can be much smaller thandk(see Exercise 1). However there are several natural
binary classes,Hbin(e.g., halfspaces), for which Ndim(HbinOvA,k) = Ω(dk) (see
Exercise 6).

29.3.2 General Multiclass-to-Binary Reductions


The same reasoning used to establish Lemma 29.5 can be used to upper bound
the Natarajan dimension of more general multiclass-to-binary reductions. These
reductions train several binary classifiers on the data. Then, given a new in-
stance, they predict its label by using some rule that takes into account the
labels predicted by the binary classifiers. These reductions include One-versus-
All and All-Pairs.
Suppose that such a method trainslbinary classifiers from a binary classHbin,
andr:{ 0 , 1 }l→[k] is the rule that determines the (multiclass) label according
to the predictions of the binary classifiers. The hypothesis class corresponding
to this method can be defined as follows. For every ̄h= (h 1 ,...,hl)∈(Hbin)l
defineR( ̄h) :X →[k] by

R( ̄h)(x) =r(h 1 (x),...,hl(x)).
Finally, let
Hrbin={R( ̄h) : ̄h∈(Hbin)l}.
Similarly to Lemma 29.5 it can be proven that:
lemma29.6 Ifd= VCdim(Hbin)then

Ndim(Hrbin) ≤ 3 ldlog (ld).
The proof is left as Exercise 2.

29.3.3 Linear Multiclass Predictors


Next, we consider the class of linear multiclass predictors (see Section 17.2). Let
Ψ :X ×[k]→Rdbe some class-sensitive feature mapping and let

HΨ=

{

x7→argmax
i∈[k]

〈w,Ψ(x,i)〉 : w∈Rd

}

. (29.1)

Each hypothesis inHΨis determined bydparameters, namely, a vectorw∈
Rd. Therefore, we would expect that the Natarajan dimension would be upper
bounded byd. Indeed:
Free download pdf