32.3 Regret analysis 371
Proof of Theorem 32.2 The first step in the proof is an upper bound on the
expected number of clicks in the optimal lista∗. Fix timet, blockPtdand recall
thatItd∗ =minPtdis the most attractive item inPtd. Letk=A−t^1 (Itd∗) be the
position of itemItd∗ andσbe the permutation that exchanges itemskandItd∗.
By Lemma 32.6, on the eventFtcwe haveItd∗ ≤k. From Parts (c) and (d) of
Assumption 32.1 we havev(At,k)≥v(σ◦At,k)≥v(a∗,k). Hence on the event
Ftcthe expected number of clicks onI∗tdis bounded from below by those on items
ina∗,
Et− 1
[
CtI∗td
]
=
∑
k∈Itd
Pt− 1 (A−t^1 (Itd∗) =k)Et− 1 [v(At,k)|A−t^1 (I∗td) =k]
=
1
|Itd|
∑
k∈Itd
Et− 1 [v(At,k)|A−t^1 (Itd∗) =k]≥
1
|Itd|
∑
k∈Itd
v(a∗,k),
where we also used the fact that TopRank randomizes within each block to
guarantee thatPt− 1 (A−t^1 (I∗td) =k) = 1/|Itd|for anyk∈Itd. Using this and the
design of TopRank,
∑m
k=1
v(a∗,k) =
∑Mt
d=1
∑
k∈Itd
v(a∗,k)≤
∑Mt
d=1
|Itd|Et− 1
[
CtItd∗
]
.
Therefore under eventFtcthe conditional expected regret in roundtis bounded
by
∑m
k=1
v(a∗,k)−Et− 1
∑`
j=1
Ctj
≤Et− 1
∑Mt
d=1
|Ptd|CtItd∗−
∑`
j=1
Ctj
=Et− 1
∑Mt
d=1
∑
j∈Ptd
(CtI∗td−Ctj)
=
∑Mt
d=1
∑
j∈Ptd
Et− 1 [UtI∗tdj]
≤
∑`
j=1
min{∑m,j− 1 }
i=1
Et− 1 [Utij]. (32.4)
The last inequality follows by noting thatEt− 1 [UtItd∗j]≤
∑min{m,j− 1 }
i=1 Et−^1 [Utij].
To see this use part (a) of Lemma 32.3 to show thatEt− 1 [Utij]≥0 fori < jand
Lemma 32.6 to show that whenI∗td> m, then neitherItd∗ norjare not shown to
the user in roundtso thatUtI∗tdj= 0. Substituting the bound in Eq. (32.4) into
the regret leads to
Rn≤nmP(Fn) +
∑`
j=1
min{∑m,j− 1 }
i=1
E[I{Fnc}Snij], (32.5)
where we used the fact that the maximum number of clicks overnrounds is