16.5 Exercises 202
E=Mkandπbe a consistent policy forE. Prove that the asymptotic upper
bound on the regret proven in Exercise 10.5 is tight.
16.4(Unknown subgaussian constant) Let
M=
{
P: there exists aσ^2 ≥0 such thatPisσ^2 -subgaussian
}
.
(a) Find a distributionPsuch thatP /∈M.
(b)Suppose thatP∈Mhas meanμ∈R. Prove thatdinf(P,μ∗,M) = 0 for all
μ∗> μ.
(c)LetE={(Pi) :Pi∈Mfor all 1≤i≤k}. Prove that ifk >1, then for all
consistent policiesπ,
lim infn→∞ Rn(π,ν)
log(n)
=∞ for allν∈E.
(d)Letf:N→[0,∞) be any increasing function withlimn→∞f(n)/log(n) =∞.
Prove there exists a policyπsuch that
lim sup
n→∞
Rn(π,ν)
f(n)
= 0 for allν∈E,
whereEis as in the previous part.
(e) Conclude there exists a consistent policy forE.
16.5(Minimax lower bound) Use Lemma 16.3 to prove Theorem 15.2, possibly
with different constants.
16.6(Refining the lower-order terms) Letk= 2 and forν∈ EN^2 let
∆(ν) =max{∆ 1 (ν),∆ 2 (ν)}. Suppose thatπis a policy such that for allν∈EN^2
with ∆(ν)≤1 it holds that
Rn(π,ν)≤
Clog(n)
∆(ν)
. (16.6)
(a) Give an example of a policy satisfying Eq. (16.6).
(b)Assume that i = 2 is suboptimal forν and α ∈ (0,1) be such that
Eνπ[T 2 (n)] = 2∆(^1 ν) 2 log(α). Letν′be the alternative environment where
μ 1 (ν′) =μ 1 (ν) andμ 2 (ν′) =μ 1 (ν) + 2∆(ν). Show that
exp(−D(Pνπ,Pν′π)) =^1
α
.
(c) LetAbe the event thatT 2 (n)≥n/2. Show that
Pνπ(A)≤
2 Clog(n)
n∆(ν)^2
and Pν′π(A)≥
1
2 α
−
2 Clog(n)
n∆(ν)^2
.
(d) Show that
Rn(π,ν′)≥
n∆(ν)
2
(
1
2 α
−
2 Clog(n)
n∆(ν)^2