Bandit Algorithms

23.2 Elimination on the hypercube 264

Clearly ifθi= 0, thenRni= 0. And so it suffices to boundRnifor eachiwith |θi|>0. Suppose that|θi|>0 for someiand the failure eventFigiven in Lemma 23.3 does not occur. Thenθi∈Cτi+1,tand by definition of the algorithm Ati= sign(θi) for allt≥τi. Therefore

Rni=n|θi|−E

[n ∑

t=1

Atiθi

]

=E

[n ∑

t=1

|θi|(1−Atisign(θi))

]

≤ 2 n|θi|P(Fi) +|θi|E[I{Fic}τi] (23.4)

Sinceτiis the first roundtwhen 0∈C/ t+1,iit follows that ifFidoes not occur, thenθ∈Cτi,iand 0∈Cτi,i. Thus the width of the confidence intervalCτi,imust be at least|θi|and so

2 Wτi− 1 = 4

√(

1

τi− 1 +

1

(τi−1)^2

)

log

(

n

√

2 τi− 1

)

≥|θi|,

which after rearranging shows for some universal constantC >0 that

I{Fic}(τi−1)≤1 +Clog(n) θ^2 i

.

Combining this result with Eq. (23.4) leads to

Rni≤ 2 n|θi|P(Fi) +|θi|+

Clog(n) |θi|

.

Using Lemma 23.3 to boundP(Fi)and substituting into the decomposition Eq. (23.3) completes the proof of the first part. The second part is left as a treat for you (Exercise 23.2).

Proof of Lemma 23.3 LetSti=

∑

j 6 =iAtjθjandZti=Atiηt+AtiSti. Fort≤τi,

θˆti−θi=^1 t

∑t

s=1

Zsi.

The next step is to show thatZtiis conditionally

√

2-subgaussian fort≤τi.

E[exp(λZti)|Gt− 1 ] =E[E[exp(λZti)|Ft− 1 ]|Gt− 1 ] =E[exp(λAtiSti)E[exp(λAtiηt)|Ft− 1 ]|Gt− 1 ]

≤E

[

exp(λAtiSti) exp

(

λ^2 2

)∣∣

∣∣Gt− 1

]

= exp

(

λ^2 2

)

E[E[exp(λAtiSti)|Gt− 1 ,Sti]|Gt− 1 ]

≤exp

(

λ^2 2

)

E

[

exp

(

λ^2 Sti^2 2

)∣∣

∣∣Gt− 1

]

≤exp(λ^2 ).

The first inequality used the fact thatηtis conditionally 1-subgaussian. The second

Bandit Algorithms

]

=E

]

√(

1

1

)

(

√

)

.

.

∑

√

[

(

)∣∣

]

(

)

(

)

E

[

(

)∣∣

]

Get our desktop app

Company

Features

Documentation

Resources