16.5 Exercises 203
(e) Show thatα≥ n∆(ν)
2
8 Clog(n)and conclude that
Rn(π,ν)≥^1
2∆(ν)
log
(
n∆(ν)^2
8 Clog(n)
)
.
(f) Generalize the argument to an arbitrary number of arms.
16.7(Lower bounds on variance) Letk >1 andE ⊂ ENk be the set of
k-armed Gaussian bandits with mean rewards in [0,1] for all arms. Suppose that
πis a policy such that for allν∈E,
lim sup
n→∞
Rn(π,ν)
log(n)
≤
∑
i:∆i> 0
2(1 +p)
∆i
.
Prove that
lim sup
n→∞
sup
ν∈E
log(V[Rˆn(π,ν)])
(1−p) log(n)
≥ 1 ,
whereRˆn(π,ν) =nμ∗(ν)−
∑n
t=1μAt(ν).