Bandit Algorithms

16.5 Exercises 203

(e) Show thatα≥ n∆(ν)

2 8 Clog(n)and conclude that

Rn(π,ν)≥^1 2∆(ν)

log

(

n∆(ν)^2 8 Clog(n)

)

.

(f) Generalize the argument to an arbitrary number of arms.

16.7(Lower bounds on variance) Letk >1 andE ⊂ ENk be the set of
k-armed Gaussian bandits with mean rewards in [0,1] for all arms. Suppose that
πis a policy such that for allν∈E,

lim sup n→∞

Rn(π,ν) log(n)

≤

∑

i:∆i> 0

2(1 +p) ∆i

.

Prove that

lim sup n→∞

sup ν∈E

log(V[Rˆn(π,ν)]) (1−p) log(n)

≥ 1 ,

whereRˆn(π,ν) =nμ∗(ν)−

∑n t=1μAt(ν).

Bandit Algorithms

(

)

.

≤

∑

.

≥ 1 ,

Get our desktop app

Company

Features

Documentation

Resources