Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

406 Multiclass Learnability


theorem29.7 Ndim(HΨ)≤d.
Proof LetC ⊂ X be a shattered set, and letf 0 ,f 1 :C→ [k] be the two
functions that witness the shattering. We need to show that|C| ≤d. For every
x∈C letρ(x) = Ψ(x,f 0 (x))−Ψ(x,f 1 (x)). We claim that the setρ(C)def=
{ρ(x) : x∈C}consists of|C|elements (i.e.,ρis one to one) and is shattered
by the binary hypothesis class of homogeneous linear separators onRd,

H={x7→sign(〈w,x〉) :w∈Rd}.
Since VCdim(H) =d, it will follow that|C|=|ρ(C)|≤d, as required.
To establish our claim it is enough to show that|Hρ(C)|= 2|C|. Indeed, given
a subsetB⊂C, by the definition of shattering, there existshB∈HΨfor which
∀x∈B,hB(x) =f 0 (x) and ∀x∈C\B,hB(x) =f 1 (x).

LetwB∈Rdbe a vector that defineshB. We have that, for everyx∈B,
〈w,Ψ(x,f 0 (x))〉>〈w,Ψ(x,f 1 (x))〉 ⇒ 〈w,ρ(x)〉> 0.

Similarly, for everyx∈C\B,
〈w,ρ(x)〉< 0.

It follows that the hypothesisgB∈ Hdefined by the samew∈Rdlabel the
points inρ(B) by 1 and the points inρ(C\B) by 0. Since this holds for every
B⊆Cwe obtain that|C|=|ρ(C)|and|Hρ(C)|= 2|C|, which concludes our
proof.
The theorem is tight in the sense that there are mappings Ψ for which Ndim(HΨ) =
Ω(d). For example, this is true for the multivector construction (see Section 17.2
and the Bibliographic Remarks at the end of this chapter). We therefore con-
clude:
corollary29.8 LetX=Rnand letΨ :X×[k]→Rnkbe the class sensitive
feature mapping for the multi-vector construction:

Ψ(x,y) = [ 0︸,...,︷︷ (^0) ︸
∈R(y−1)n
, x︸ 1 ,...,x︷︷ n︸
∈Rn
, (^0) ︸,...,︷︷ (^0) ︸
∈R(k−y)n


].

LetHΨbe as defined in Equation (29.1). Then, the Natarajan dimension ofHΨ
satisfies
(k−1)(n−1) ≤ Ndim(HΨ) ≤ kn.

29.4 On Good and Bad ERMs


In this section we present an example of a hypothesis class with the property
that not all ERMs for the class are equally successful. Furthermore, if we allow
an infinite number of labels, we will also obtain an example of a class that is
Free download pdf