Bandit Algorithms

(Jeff_L) #1
23.2 Elimination on the hypercube 264

Clearly ifθi= 0, thenRni= 0. And so it suffices to boundRnifor eachiwith
|θi|>0. Suppose that|θi|>0 for someiand the failure eventFigiven in
Lemma 23.3 does not occur. Thenθi∈Cτi+1,tand by definition of the algorithm
Ati= sign(θi) for allt≥τi. Therefore

Rni=n|θi|−E

[n

t=1

Atiθi

]


=E


[n

t=1

|θi|(1−Atisign(θi))

]


≤ 2 n|θi|P(Fi) +|θi|E[I{Fic}τi] (23.4)

Sinceτiis the first roundtwhen 0∈C/ t+1,iit follows that ifFidoes not occur,
thenθ∈Cτi,iand 0∈Cτi,i. Thus the width of the confidence intervalCτi,imust
be at least|θi|and so

2 Wτi− 1 = 4

√(


1


τi− 1 +

1


(τi−1)^2

)


log

(


n


2 τi− 1

)


≥|θi|,

which after rearranging shows for some universal constantC >0 that


I{Fic}(τi−1)≤1 +Clog(n)
θ^2 i

.


Combining this result with Eq. (23.4) leads to

Rni≤ 2 n|θi|P(Fi) +|θi|+

Clog(n)
|θi|

.


Using Lemma 23.3 to boundP(Fi)and substituting into the decomposition
Eq. (23.3) completes the proof of the first part. The second part is left as a treat
for you (Exercise 23.2).

Proof of Lemma 23.3 LetSti=


j 6 =iAtjθjandZti=Atiηt+AtiSti. Fort≤τi,

θˆti−θi=^1
t

∑t

s=1

Zsi.

The next step is to show thatZtiis conditionally



2-subgaussian fort≤τi.

E[exp(λZti)|Gt− 1 ] =E[E[exp(λZti)|Ft− 1 ]|Gt− 1 ]
=E[exp(λAtiSti)E[exp(λAtiηt)|Ft− 1 ]|Gt− 1 ]

≤E

[


exp(λAtiSti) exp

(


λ^2
2

)∣∣


∣∣Gt− 1

]


= exp

(


λ^2
2

)


E[E[exp(λAtiSti)|Gt− 1 ,Sti]|Gt− 1 ]

≤exp

(


λ^2
2

)


E


[


exp

(


λ^2 Sti^2
2

)∣∣


∣∣Gt− 1

]


≤exp(λ^2 ).

The first inequality used the fact thatηtis conditionally 1-subgaussian. The second

Free download pdf