Understanding Machine Learning: From Theory to Algorithms

17.2 Linear Multiclass Predictors 235

Consider running Multiclass SVM withλ=

√

2 ρ^2 B^2 mon a training setS∼ D

m

and lethwbe the output of Multiclass SVM. Then,

E S∼Dm

[L∆D(hw)] ≤ E S∼Dm

[LgD−hinge(w)] ≤ min u:‖u‖≤B

LgD−hinge(u) +

√

8 ρ^2 B^2 m

,

whereL∆D(h) =E(x,y)∼D[∆(h(x),y)]andLgD−hinge(w) =E(x,y)∼D[(w,(x,y))] withbeing the generalized hinge-loss as defined in Equation (17.3).

We can also apply the SGD learning framework for minimizingLgD−hinge(w) as
described in Chapter 14. Recall Claim 14.6, which dealt with subgradients of max
functions. In light of this claim, in order to find a subgradient of the generalized
hinge loss all we need to do is to findy∈ Ythat achieves the maximum in the
definition of the generalized hinge loss. This yields the following algorithm:

SGD for Multiclass Learning parameters: Scalarη >0, integerT > 0 loss function ∆ :Y ×Y →R+ class-sensitive feature mapping Ψ :X ×Y →Rd initialize: w(1)= 0 ∈Rd fort= 1, 2 ,...,T sample (x, y)∼D find ˆy∈argmaxy′∈Y

(

∆(y′,y) +〈w(t),Ψ(x,y′)−Ψ(x,y)〉

)

setvt= Ψ(x,yˆ)−Ψ(x,y) updatew(t+1)=w(t)−ηvt outputw ̄=T^1

∑T

t=1w(t) Our general analysis of SGD given in Corollary 14.12 immediately implies:

corollary17.2 LetDbe a distribution overX ×Y, letΨ :X ×Y →Rd,
and assume that for allx∈ Xandy∈ Ywe have‖Ψ(x,y)‖ ≤ρ/ 2. LetB > 0.
Then, for every > 0 , if we run SGD for multiclass learning with a number of
iterations (i.e., number of examples)

T≥

B^2 ρ^2 ^2

and withη=

√

B^2 ρ^2 T, then the output of SGD satisfies

S∼DEm[L∆D(hw ̄)] ≤ S∼DEm[LgD−hinge(w ̄)] ≤ min u:‖u‖≤B

LgD−hinge(u) +.

Remark 17.3 It is interesting to note that the risk bounds given in Corol-
lary 17.1 and Corollary 17.2 do not depend explicitly on the size of the label
setY, a fact we will rely on in the next section. However, the bounds may de-
pend implicitly on the size ofYvia the norm of Ψ(x,y) and the fact that the
bounds are meaningful only when there exists some vectoru,‖u‖≤B, for which
LgD−hinge(u) is not excessively large.

Understanding Machine Learning: From Theory to Algorithms

√

√

,

(

)

∑T

√

Get our desktop app

Company

Features

Documentation

Resources