17.4 Ranking 241In our case,y′i−yj′=〈w,xi−xj〉. It follows that we can use the hinge loss upper
bound as follows:
(^1) [sign(yi−yj)(y′i−y′j)≤0] ≤ max{ 0 , 1 −sign (yi−yj)〈w,xi−xj〉}.
Taking the average over the pairs we obtain the following surrogate convex loss
for the Kendall tau loss function:
∆(hw(x ̄),y) ≤
2
r(r−1)r∑− 1i=1∑rj=i+1max{ 0 , 1 −sign(yi−yj)〈w,xi−xj〉}.The right-hand side is convex with respect towand upper bounds the Kendall
tau loss. It is also aρ-Lipschitz function with parameterρ≤maxi,j‖xi−xj‖.
A Hinge Loss for the NDCG Loss Function:
The NDCG loss function depends on the predicted ranking vectory′∈Rrvia
the permutation it induces. To derive a surrogate loss function we first make
the following observation. LetVbe the set of all permutations of [r] encoded as
vectors; namely, eachv∈V is a vector in [r]rsuch that for alli 6 =jwe have
vi 6 =vj. Then (see Exercise 4),
π(y′) = argmax
v∈V∑ri=1viy′i. (17.7)Let us denote Ψ(x ̄,v) =
∑r
i=1vixi; it follows thatπ(hw( ̄x)) = argmax
v∈V∑ri=1vi〈w,xi〉= argmax
v∈V〈
w,∑ri=1vixi〉
= argmax
v∈V〈w,Ψ( ̄x,v)〉.On the basis of this observation, we can use the generalized hinge loss for cost-
sensitive multiclass classification as a surrogate loss function for the NDCG loss
as follows:
∆(hw( ̄x),y) ≤ ∆(hw(x ̄),y) +〈w,Ψ(x ̄,π(hw(x ̄)))〉−〈w,Ψ(x ̄,π(y))〉
≤max
v∈V[∆(v,y) +〈w,Ψ(x ̄,v)〉−〈w,Ψ(x ̄,π(y))〉]= maxv∈V[
∆(v,y) +∑ri=1(vi−π(y)i)〈w,xi〉]
. (17.8)
The right-hand side is a convex function with respect tow.
We can now solve the learning problem using SGD as described in Section 17.2.5.
The main computational bottleneck is calculating a subgradient of the loss func-
tion, which is equivalent to findingvthat achieves the maximum in Equa-
tion (17.8) (see Claim 14.6). Using the definition of the NDCG loss, this is
