Bandit Algorithms

12.3 Notes 163

Proof of Lemma 12.2 Fixt∈[n] and letEt[·]=E[·|Ft] denote the conditional expectation with respect toFt. By Lemma 12.5 and the assumption that 0 ≤αtiY ̃ti≤ 2 λtiwe have

exp

(

αtiY ̃ti 1 +λti

)

≤(1 +αtiY ̃ti).

Taking the product of these inequalities overi,

Et− 1

[

exp

(k ∑

i=1

αtiY ̃ti 1 +λti

)]

≤Et− 1

[k ∏

i=1

(1 +αtiY ̃ti)

]

≤1 +Et− 1

[k ∑

i=1

αtiY ̃ti

]

= 1 +

∑k

i=1

αtiyti≤exp

(k ∑

i=1

αtiyti

)

, (12.10)

where the second inequality follows from the assumption that forS⊂[k] with |S|>1,Et− 1

∏

i∈SY ̃ti≤0, the third one follows from the assumption that Et− 1 Y ̃ti=yti, while the last one follows from 1 +x≤exp(x). Define

Zt= exp

(k ∑

i=1

αti

( ̃

Yti 1 +λti

−yti

))

and letMt=Z 1 ...Zt,t∈[n] withM 0 = 1. By(12.10),Et− 1 [Zt]≤1. Therefore E[Mt] =E[Et− 1 [Mt]] =E[Mt− 1 Et− 1 [Zt]]≤E[Mt− 1 ]≤···≤E[M 0 ] = 1. Settingt=nand combining the above display with Markov’s inequality leads to P(log(Mn)≥log(1/δ)) =P(Mnδ≥1)≤E[Mn]δ≤δ.

12.3 Notes

1 An upper bound on the expected regret of Exp3-IX can be obtained by integrating the tail.

Rn≤E[(Rˆn)+] =

∫∞

0

P

(

(Rˆn)+≥x

)

dx≤

∫∞

0

P

(

Rˆn≥x

)

dx,

where the first equality follows from Proposition 2.8. The result is completed using either the high probability bound in Theorem 12.1 and by straightforward integration. We leave the details to the reader in Exercise 12.4. 2 The analysis presented here uses a fixed learning rate that depends on the horizon. Replacingηandγwithηt=

√

log(k)/(kt)andγt=ηt/2 leads to an anytime algorithm with about the same regret [Koc ́ak et al., 2014, Neu, 2015a]. 3 There is another advantage of the modified importance-weighted estimators used by Exp3-IX, which leads to an improved regret in the special case that one of the arms has small losses. Specifically, it is possible to show that

Rn=O

(√

kmin i∈[k]

Linlog(k)

)

.

Bandit Algorithms

(

)

[

)]

]

]

= 1 +

)

, (12.10)

∏

( ̃

))

12.3 Notes

∫∞

P

(

)

∫∞

P

(

)

√

(√

)

.

Get our desktop app

Company

Features

Documentation

Resources