Bandit Algorithms

(Jeff_L) #1
32.3 Regret analysis 370

On the other hand, wheniandjare in the same block,Eu− 1 [Uuji|Uuji 6 = 0]≤ 0
almost surely by Lemma 32.3. Based on these observations,

Ssji<


2 Nsjilog

(c
δ


Nsji

)


for alls < t,

which by the design of TopRank implies that (i,j)∈/Gt.


Lemma32.6. LetI∗td=minPtdbe the most attractive item inPtd. Then on
eventFtcit holds thatItd∗≤1 +



c<d|Ptd|for alld∈[Mt].
Proof Leti∗=min∪c≥dPtc. Theni∗≤1 +


c<d|Ptd|holds trivially for any
Pt 1 ,...,PtMtandd∈[Mt]. Now consider two cases. Suppose thati∗∈Ptd. Then
it must be true thati∗=Itd∗ and our claim holds. On the other hand, suppose
thati∗∈Ptcfor somec > d. Then by Lemma 32.5 and the design of the partition,
there must exist a sequence of itemsid,...,icin blocksPtd,...,Ptcsuch that
id<···< ic=i∗. From the definition ofI∗td,Itd∗≤id< i∗. This concludes our
proof.

Lemma32.7.On the eventFncand for al li < jit holds that

Snij≤1 +

6(α(i) +α(j))
∆ij log

(


c


n
δ

)


.


Proof The result is trivial whenNnij= 0. Assume from now on thatNnij>0.
By the definition of the algorithm armsiandjare not in the same block once
Stijgrows too large relative toNtij, which means that

Snij≤1 +


2 Nnijlog

(c
δ


Nnij

)


.


On the eventFncand part (a) of Lemma 32.3 it also follows that

Snij≥

∆ijNnij
α(i) +α(j)



2 Nnijlog

(c
δ


Nnij

)


.


Combining the previous two displays shows that

∆ijNnij
α(i) +α(j)



2 Nnijlog

(c
δ


Nnij

)


≤Snij≤1 +


2 Nnijlog

(c
δ


Nnij

)


≤(1 +



2)



Nnijlog

(c
δ


Nnij

)


. (32.3)


Using the fact thatNnij≤nand rearranging the terms in the previous display
shows that

Nnij≤

(1 + 2



2)^2 (α(i) +α(j))^2
∆^2 ij log

(


c


n
δ

)


.


The result is completed by substituting this into Eq. (32.3).

Free download pdf