Bandit Algorithms

34.9 Exercises 410

(a)Ifπis the unique Bayesian optimal policy given priorQ, thenπis admissible.
(b)There is an example whenπis a Bayesian optimal policy andπis inadmissible.
(c) IfEis countable and Supp(Q) =E, thenπis admissible.

34.13(Admissible policies are Bayesian for Bernoulli bandits) LetE be the set ofk-armed Bernoulli bandits. Prove that every admissible policy is Bayesian optimal for some prior.

Hint Argue that all policies can be written as convex combinations of deterministic policies using an appropriate linear structure. Then identify the spaces of environments and policies with compact metric spaces. Let (νj)∞j=1be a dense subset ofEand repeat the argument in the previous exercise with each finite subset{ν 1 ,...,νj}and then take the limit asj→∞. You will probably find Theorem 2.14 useful. 34.14 LetE=EBkbe the space ofk-armed Bernoulli bandits. EndowEwith a topology via the natural bijection to [0,1]k and letQbe the space of all probability measures on (E,B(E)) with the weak* topology. Prove that max Q∈Q

BR∗n(Q) =R∗n(E).

Bandit Algorithms

Get our desktop app

Company

Features

Documentation

Resources