Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
17.4 Ranking 241

In our case,y′i−yj′=〈w,xi−xj〉. It follows that we can use the hinge loss upper
bound as follows:


(^1) [sign(yi−yj)(y′i−y′j)≤0] ≤ max{ 0 , 1 −sign (yi−yj)〈w,xi−xj〉}.
Taking the average over the pairs we obtain the following surrogate convex loss
for the Kendall tau loss function:
∆(hw(x ̄),y) ≤


2

r(r−1)

r∑− 1

i=1

∑r

j=i+1

max{ 0 , 1 −sign(yi−yj)〈w,xi−xj〉}.

The right-hand side is convex with respect towand upper bounds the Kendall
tau loss. It is also aρ-Lipschitz function with parameterρ≤maxi,j‖xi−xj‖.


A Hinge Loss for the NDCG Loss Function:


The NDCG loss function depends on the predicted ranking vectory′∈Rrvia
the permutation it induces. To derive a surrogate loss function we first make
the following observation. LetVbe the set of all permutations of [r] encoded as
vectors; namely, eachv∈V is a vector in [r]rsuch that for alli 6 =jwe have
vi 6 =vj. Then (see Exercise 4),


π(y′) = argmax
v∈V

∑r

i=1

viy′i. (17.7)

Let us denote Ψ(x ̄,v) =


∑r
i=1vixi; it follows that

π(hw( ̄x)) = argmax
v∈V

∑r

i=1

vi〈w,xi〉

= argmax
v∈V


w,

∑r

i=1

vixi


= argmax
v∈V

〈w,Ψ( ̄x,v)〉.

On the basis of this observation, we can use the generalized hinge loss for cost-
sensitive multiclass classification as a surrogate loss function for the NDCG loss
as follows:


∆(hw( ̄x),y) ≤ ∆(hw(x ̄),y) +〈w,Ψ(x ̄,π(hw(x ̄)))〉−〈w,Ψ(x ̄,π(y))〉
≤max
v∈V

[∆(v,y) +〈w,Ψ(x ̄,v)〉−〈w,Ψ(x ̄,π(y))〉]

= maxv∈V

[

∆(v,y) +

∑r

i=1

(vi−π(y)i)〈w,xi〉

]

. (17.8)

The right-hand side is a convex function with respect tow.
We can now solve the learning problem using SGD as described in Section 17.2.5.
The main computational bottleneck is calculating a subgradient of the loss func-
tion, which is equivalent to findingvthat achieves the maximum in Equa-
tion (17.8) (see Claim 14.6). Using the definition of the NDCG loss, this is

Free download pdf