17.4 Ranking 241
In our case,y′i−yj′=〈w,xi−xj〉. It follows that we can use the hinge loss upper
bound as follows:
(^1) [sign(yi−yj)(y′i−y′j)≤0] ≤ max{ 0 , 1 −sign (yi−yj)〈w,xi−xj〉}.
Taking the average over the pairs we obtain the following surrogate convex loss
for the Kendall tau loss function:
∆(hw(x ̄),y) ≤
2
r(r−1)
r∑− 1
i=1
∑r
j=i+1
max{ 0 , 1 −sign(yi−yj)〈w,xi−xj〉}.
The right-hand side is convex with respect towand upper bounds the Kendall
tau loss. It is also aρ-Lipschitz function with parameterρ≤maxi,j‖xi−xj‖.
A Hinge Loss for the NDCG Loss Function:
The NDCG loss function depends on the predicted ranking vectory′∈Rrvia
the permutation it induces. To derive a surrogate loss function we first make
the following observation. LetVbe the set of all permutations of [r] encoded as
vectors; namely, eachv∈V is a vector in [r]rsuch that for alli 6 =jwe have
vi 6 =vj. Then (see Exercise 4),
π(y′) = argmax
v∈V
∑r
i=1
viy′i. (17.7)
Let us denote Ψ(x ̄,v) =
∑r
i=1vixi; it follows that
π(hw( ̄x)) = argmax
v∈V
∑r
i=1
vi〈w,xi〉
= argmax
v∈V
〈
w,
∑r
i=1
vixi
〉
= argmax
v∈V
〈w,Ψ( ̄x,v)〉.
On the basis of this observation, we can use the generalized hinge loss for cost-
sensitive multiclass classification as a surrogate loss function for the NDCG loss
as follows:
∆(hw( ̄x),y) ≤ ∆(hw(x ̄),y) +〈w,Ψ(x ̄,π(hw(x ̄)))〉−〈w,Ψ(x ̄,π(y))〉
≤max
v∈V
[∆(v,y) +〈w,Ψ(x ̄,v)〉−〈w,Ψ(x ̄,π(y))〉]
= maxv∈V
[
∆(v,y) +
∑r
i=1
(vi−π(y)i)〈w,xi〉
]
. (17.8)
The right-hand side is a convex function with respect tow.
We can now solve the learning problem using SGD as described in Section 17.2.5.
The main computational bottleneck is calculating a subgradient of the loss func-
tion, which is equivalent to findingvthat achieves the maximum in Equa-
tion (17.8) (see Claim 14.6). Using the definition of the NDCG loss, this is