Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

248 Multiclass, Ranking, and Complex Prediction Problems


in (Daniely et al. 2011, Daniely, Sabato & Shwartz 2012). See also Chapter 29,
in which we analyze the sample complexity of multiclass learning.
Direct approaches to multiclass learning with linear predictors have been stud-
ied in (Vapnik 1998, Weston & Watkins 1999, Crammer & Singer 2001). In par-
ticular, the multivector construction is due to Crammer & Singer (2001).
Collins (2000) has shown how to apply the Perceptron algorithm for structured
output problems. See also Collins (2002). A related approach is discriminative
learning of conditional random fields; see Lafferty, McCallum & Pereira (2001).
Structured output SVM has been studied in (Weston, Chapelle, Vapnik, Elisseeff
& Sch ̈olkopf 2002, Taskar, Guestrin & Koller 2003, Tsochantaridis, Hofmann,
Joachims & Altun 2004).
The dynamic procedure we have presented for calculating the predictionhw(x)
in the structured output section is similar to the forward-backward variables
calculated by the Viterbi procedure in HMMs (see, for instance, (Rabiner &
Juang 1986)). More generally, solving the maximization problem in structured
output is closely related to the problem of inference in graphical models (see, for
example, Koller & Friedman (2009)).
Chapelle, Le & Smola (2007) proposed to learn a ranking function with respect
to the NDCG loss using ideas from structured output learning. They also ob-
served that the maximization problem in the definition of the generalized hinge
loss is equivalent to the assignment problem.
Agarwal & Roth (2005) analyzed the sample complexity of bipartite ranking.
Joachims (2005) studied the applicability of structured output SVM to bipartite
ranking with multivariate performance measures.

17.8 Exercises



  1. Consider a setSof examples inRn×[k] for which there exist vectorsμ 1 ,...,μk
    such that every example (x,y)∈Sfalls within a ball centered atμywhose
    radius isr≥1. Assume also that for everyi 6 =j,‖μi−μj‖ ≥ 4 r. Con-
    sider concatenating each instance by the constant 1 and then applying the
    multivector construction, namely,


Ψ(x,y) = [ (^0) ︸,...,︷︷ (^0) ︸
∈R(y−1)(n+1)
, x︸ 1 ,...,x︷︷ n, (^1) ︸
∈Rn+1
, (^0) ︸,...,︷︷ (^0) ︸
∈R(k−y)(n+1)


].

Show that there exists a vectorw∈Rk(n+1)such that`(w,(x,y)) = 0 for
every (x,y)∈S.
Hint:Observe that for every example (x,y)∈Swe can writex=μy+vfor
some‖v‖≤r. Now, takew= [w 1 ,...,wk], wherewi= [μi,−‖μi‖^2 /2].
2.Multiclass Perceptron: Consider the following algorithm:
Free download pdf