Bandit Algorithms

(Jeff_L) #1
29.7 Exercises 335

where`(At) =〈At,θ〉+ε(At) andε= supa∈Aε(a) and|Yt|≤1 almost surely.


(a)Suppose thatA=Bd 2. Show that the expected regret of an appropriately
tuned version of the algorithm in Section 28.4 satisfies


Rn≤C(d


nlog(n) +εn),
whereC >0 is a universal constant.
(b) Explain why the result from Part (a) is essentially optimal.
(c)Suppose thatAis finite. What goes wrong in the analysis of exponential
weights with Kiefer–Wolfowitz exploration? (Algorithm 15).

Free download pdf