Bandit Algorithms

(Jeff_L) #1
16.5 Exercises 203

(e) Show thatα≥ n∆(ν)

2
8 Clog(n)and conclude that

Rn(π,ν)≥^1
2∆(ν)

log

(


n∆(ν)^2
8 Clog(n)

)


.


(f) Generalize the argument to an arbitrary number of arms.

16.7(Lower bounds on variance) Letk >1 andE ⊂ ENk be the set of
k-armed Gaussian bandits with mean rewards in [0,1] for all arms. Suppose that
πis a policy such that for allν∈E,


lim sup
n→∞

Rn(π,ν)
log(n)



i:∆i> 0

2(1 +p)
∆i

.


Prove that

lim sup
n→∞

sup
ν∈E

log(V[Rˆn(π,ν)])
(1−p) log(n)

≥ 1 ,


whereRˆn(π,ν) =nμ∗(ν)−


∑n
t=1μAt(ν).
Free download pdf