29.7 Exercises 335
where`(At) =〈At,θ〉+ε(At) andε= supa∈Aε(a) and|Yt|≤1 almost surely.
(a)Suppose thatA=Bd 2. Show that the expected regret of an appropriately
tuned version of the algorithm in Section 28.4 satisfies
Rn≤C(d
√
nlog(n) +εn),
whereC >0 is a universal constant.
(b) Explain why the result from Part (a) is essentially optimal.
(c)Suppose thatAis finite. What goes wrong in the analysis of exponential
weights with Kiefer–Wolfowitz exploration? (Algorithm 15).