32.3 Regret analysis 371Proof of Theorem 32.2 The first step in the proof is an upper bound on the
expected number of clicks in the optimal lista∗. Fix timet, blockPtdand recall
thatItd∗ =minPtdis the most attractive item inPtd. Letk=A−t^1 (Itd∗) be the
position of itemItd∗ andσbe the permutation that exchanges itemskandItd∗.
By Lemma 32.6, on the eventFtcwe haveItd∗ ≤k. From Parts (c) and (d) of
Assumption 32.1 we havev(At,k)≥v(σ◦At,k)≥v(a∗,k). Hence on the event
Ftcthe expected number of clicks onI∗tdis bounded from below by those on items
ina∗,
Et− 1[
CtI∗td]
=
∑
k∈ItdPt− 1 (A−t^1 (Itd∗) =k)Et− 1 [v(At,k)|A−t^1 (I∗td) =k]=
1
|Itd|∑
k∈ItdEt− 1 [v(At,k)|A−t^1 (Itd∗) =k]≥1
|Itd|∑
k∈Itdv(a∗,k),where we also used the fact that TopRank randomizes within each block to
guarantee thatPt− 1 (A−t^1 (I∗td) =k) = 1/|Itd|for anyk∈Itd. Using this and the
design of TopRank,
∑m
k=1v(a∗,k) =∑Mtd=1∑
k∈Itdv(a∗,k)≤∑Mtd=1|Itd|Et− 1[
CtItd∗]
.
Therefore under eventFtcthe conditional expected regret in roundtis bounded
by
∑mk=1v(a∗,k)−Et− 1
∑`
j=1Ctj
≤Et− 1
∑Mtd=1|Ptd|CtItd∗−∑`
j=1Ctj
=Et− 1
∑Mtd=1∑
j∈Ptd(CtI∗td−Ctj)
=
∑Mtd=1∑
j∈PtdEt− 1 [UtI∗tdj]≤
∑`
j=1min{∑m,j− 1 }i=1Et− 1 [Utij]. (32.4)The last inequality follows by noting thatEt− 1 [UtItd∗j]≤
∑min{m,j− 1 }
i=1 Et−^1 [Utij].
To see this use part (a) of Lemma 32.3 to show thatEt− 1 [Utij]≥0 fori < jand
Lemma 32.6 to show that whenI∗td> m, then neitherItd∗ norjare not shown to
the user in roundtso thatUtI∗tdj= 0. Substituting the bound in Eq. (32.4) into
the regret leads to
Rn≤nmP(Fn) +∑`
j=1min{∑m,j− 1 }i=1E[I{Fnc}Snij], (32.5)where we used the fact that the maximum number of clicks overnrounds is