15.5 Exercises 194
(a)Use Pinsker’s inequality (Eq. 14.11) and Lemma 15.1 and the result of
Exercise 14.4 to show
Ei[Ti(n)]≤E 0 [Ti(n)] +n
√
1
4
∆^2 E 0 [Ti(n)] =E 0 [Ti(n)] +c
√
nkE 0 [Ti(n)].
(b)Using the previous part, Jensen’s inequality and the identity
∑k
i=1E^0 [Ti(n)] =
n, show that
∑k
i=1
Ei[Ti(n)]≤n+c
∑k
i=1
√
nkE 0 [Ti(n)]≤n+ckn.
(c) LetRi=Rn(π,Gμ(i)). Find a choice ofc >0 for which
∑k
i=1
Ri= ∆
∑k
i=1
(n−Ei[Ti(n)])≥∆ (nk−n−ckn)
= 2c
√
k
n
(nk−n−ckn)≥
nk
8
√
k
n
(d) Conclude there exists ani∈[k] such that
Ri≥
1
8
√
kn.
The method used in this exercise is borrowed from Auer et al. [2002b] and
is closely related to the lower bound technique known as Assouad’s method
in statistics [Yu, 1997].
15.3(Lower bound for small horizons) Letk >1 andn < k. Prove that
for any policyπthere exists a Gaussian bandit with unit variance and means
μ∈[0,1]ksuch thatRn(π,νμ)≥n(2k−n−1)/(2k)> n/2.
15.4(Lower bounds for Bernoulli bandits) Recall from Table 4.1 that
EBkis the set ofk-armed Bernoulli bandits. Show that there exists a universal
constantc >0 such that for any 2≤k≤nit holds that:
Rn∗(EBk) = infπ sup
ν∈EBk
Rn(π,ν)≥c
√
nk.
Hint Use the fact that KL divergence is upper bounded by theχ-squared
distance (Eq. (14.15)).
15.5In Chapter 9 we proved that ifπis the MOSS policy andν∈ESGk (1), then
Rn(π,ν)≤C
√
kn+
∑
i:∆i> 0
∆i