15.5 Exercises 194(a)Use Pinsker’s inequality (Eq. 14.11) and Lemma 15.1 and the result of
Exercise 14.4 to show
Ei[Ti(n)]≤E 0 [Ti(n)] +n√
1
4
∆^2 E 0 [Ti(n)] =E 0 [Ti(n)] +c√
nkE 0 [Ti(n)].(b)Using the previous part, Jensen’s inequality and the identity
∑k
i=1E^0 [Ti(n)] =
n, show that∑ki=1Ei[Ti(n)]≤n+c∑ki=1√
nkE 0 [Ti(n)]≤n+ckn.(c) LetRi=Rn(π,Gμ(i)). Find a choice ofc >0 for which∑ki=1Ri= ∆∑ki=1(n−Ei[Ti(n)])≥∆ (nk−n−ckn)= 2c√
k
n(nk−n−ckn)≥nk
8√
k
n(d) Conclude there exists ani∈[k] such that
Ri≥1
8
√
kn.The method used in this exercise is borrowed from Auer et al. [2002b] and
is closely related to the lower bound technique known as Assouad’s method
in statistics [Yu, 1997].15.3(Lower bound for small horizons) Letk >1 andn < k. Prove that
for any policyπthere exists a Gaussian bandit with unit variance and means
μ∈[0,1]ksuch thatRn(π,νμ)≥n(2k−n−1)/(2k)> n/2.
15.4(Lower bounds for Bernoulli bandits) Recall from Table 4.1 that
EBkis the set ofk-armed Bernoulli bandits. Show that there exists a universal
constantc >0 such that for any 2≤k≤nit holds that:
Rn∗(EBk) = infπ sup
ν∈EBkRn(π,ν)≥c√
nk.Hint Use the fact that KL divergence is upper bounded by theχ-squared
distance (Eq. (14.15)).15.5In Chapter 9 we proved that ifπis the MOSS policy andν∈ESGk (1), then
Rn(π,ν)≤C
√
kn+∑
i:∆i> 0∆i