16.5 Exercises 202E=Mkandπbe a consistent policy forE. Prove that the asymptotic upper
bound on the regret proven in Exercise 10.5 is tight.16.4(Unknown subgaussian constant) Let
M={
P: there exists aσ^2 ≥0 such thatPisσ^2 -subgaussian}
.
(a) Find a distributionPsuch thatP /∈M.
(b)Suppose thatP∈Mhas meanμ∈R. Prove thatdinf(P,μ∗,M) = 0 for all
μ∗> μ.
(c)LetE={(Pi) :Pi∈Mfor all 1≤i≤k}. Prove that ifk >1, then for all
consistent policiesπ,
lim infn→∞ Rn(π,ν)
log(n)=∞ for allν∈E.(d)Letf:N→[0,∞) be any increasing function withlimn→∞f(n)/log(n) =∞.
Prove there exists a policyπsuch that
lim sup
n→∞Rn(π,ν)
f(n)= 0 for allν∈E,whereEis as in the previous part.
(e) Conclude there exists a consistent policy forE.16.5(Minimax lower bound) Use Lemma 16.3 to prove Theorem 15.2, possibly
with different constants.
16.6(Refining the lower-order terms) Letk= 2 and forν∈ EN^2 let
∆(ν) =max{∆ 1 (ν),∆ 2 (ν)}. Suppose thatπis a policy such that for allν∈EN^2
with ∆(ν)≤1 it holds that
Rn(π,ν)≤Clog(n)
∆(ν). (16.6)
(a) Give an example of a policy satisfying Eq. (16.6).
(b)Assume that i = 2 is suboptimal forν and α ∈ (0,1) be such that
Eνπ[T 2 (n)] = 2∆(^1 ν) 2 log(α). Letν′be the alternative environment where
μ 1 (ν′) =μ 1 (ν) andμ 2 (ν′) =μ 1 (ν) + 2∆(ν). Show that
exp(−D(Pνπ,Pν′π)) =^1
α.
(c) LetAbe the event thatT 2 (n)≥n/2. Show thatPνπ(A)≤2 Clog(n)
n∆(ν)^2and Pν′π(A)≥1
2 α−
2 Clog(n)
n∆(ν)^2.
(d) Show that
Rn(π,ν′)≥n∆(ν)
2(
1
2 α−
2 Clog(n)
n∆(ν)^2