Understanding Machine Learning: From Theory to Algorithms

29.3 Calculating the Natarajan Dimension 405

By Sauer’s lemma,|(Hbin)C|≤|C|d. We conclude that

2 |C| ≤

∣∣

∣

(

HOvAbin,k

)

C

∣∣

∣ ≤ |C|dk.

The proof follows by taking the logarithm and applying Lemma A.1. How tight is Lemma 29.5? It is not hard to see that for some classes, Ndim(HOvAbin,k) can be much smaller thandk(see Exercise 1). However there are several natural binary classes,Hbin(e.g., halfspaces), for which Ndim(HbinOvA,k) = Ω(dk) (see Exercise 6).

29.3.2 General Multiclass-to-Binary Reductions

The same reasoning used to establish Lemma 29.5 can be used to upper bound the Natarajan dimension of more general multiclass-to-binary reductions. These reductions train several binary classifiers on the data. Then, given a new in- stance, they predict its label by using some rule that takes into account the labels predicted by the binary classifiers. These reductions include One-versus- All and All-Pairs. Suppose that such a method trainslbinary classifiers from a binary classHbin, andr:{ 0 , 1 }l→[k] is the rule that determines the (multiclass) label according to the predictions of the binary classifiers. The hypothesis class corresponding to this method can be defined as follows. For every ̄h= (h 1 ,...,hl)∈(Hbin)l defineR( ̄h) :X →[k] by

R( ̄h)(x) =r(h 1 (x),...,hl(x)). Finally, let Hrbin={R( ̄h) : ̄h∈(Hbin)l}. Similarly to Lemma 29.5 it can be proven that: lemma29.6 Ifd= VCdim(Hbin)then

Ndim(Hrbin) ≤ 3 ldlog (ld). The proof is left as Exercise 2.

29.3.3 Linear Multiclass Predictors

Next, we consider the class of linear multiclass predictors (see Section 17.2). Let Ψ :X ×[k]→Rdbe some class-sensitive feature mapping and let

HΨ=

{

x7→argmax i∈[k]

〈w,Ψ(x,i)〉 : w∈Rd

}

. (29.1)

Each hypothesis inHΨis determined bydparameters, namely, a vectorw∈ Rd. Therefore, we would expect that the Natarajan dimension would be upper bounded byd. Indeed:

Understanding Machine Learning: From Theory to Algorithms

∣∣

∣

(

)

∣∣

29.3.2 General Multiclass-to-Binary Reductions

29.3.3 Linear Multiclass Predictors

HΨ=

{

}

. (29.1)

Get our desktop app

Company

Features

Documentation

Resources