Bandit Algorithms

34.9 Exercises 409

34.10 Consider the setup in Example 34.2. A Bayesian learner observesX∼Pθ and should choose an actionAt∈[0,1] that isσ(X)-measurable. Their loss is I{At 6 =θ}.

(a) Show that the optimal choice isAt=Xt.
(b)Give a Bayesian optimal algorithm withAt 6 =Xton some nonempty (measure
zero) event.
(c)Give a Bayesian optimal algorithm andθsuch that the loss whenθis true
(and soX∼Pθ) is not zero.

34.11(Admissible policies are Bayesian for finite environments) Let E={ν 1 ,...,νN}and Π be sets. Call the elements ofEenvironment, and the elements of Π policies (this is just to help to make connection to the rest of the material). Let`: Π×E →[0,∞) be a positive loss function. Given a policyπlet `(π) = (`(π,ν 1 ),...,`(π,νN)) be the loss vector resulting from policyπ. Define S={`(π) :π∈Π}⊂RNand

λ(S) ={x∈cl(S) :y 6 < xfor ally∈S},

wherey 6 < xis defined to mean it is not true thatyi≤xifor alliwith strict
inequality for at least onei(λ(S) is the Pareto frontier of setS, its elements are
the nondominated loss-outcome vectors incl(S)). Prove that ifλ(S)⊆Sand
Sis convex, then for everyπ∗∈Π such that`(π∗)∈λ(S) there exists a prior
q∈P(E) such that
∑

ν∈E

q(ν)`(π∗,ν) = minπ∈Π

∑

ν∈E

q(ν)`(π,ν).

Hint Use the supporting hyperplane theorem, stated in the hint after Exercise 26.1.

By identifying elements ofEas ‘criteria’, the interpretation of the result of the exercise in multicriteria optimization is that for non-empty, convex, closed loss-sets, solutions on the Pareto frontier (policiesπsuch that`(π)∈λ(S)) can be obtained by minimizing a convex combination of the individual criteria. There is also a connection to constrained optimization where the constraints are expressed as a bounds on linear combinations of the losses.

34.12(Uniquely Bayes optimal policies are admissible) Let (E,G) be a measurable space and Π an arbitrary set the elements of which we call policies. Let `: Π×E →Rbe a function with`(π,·) beingG-measurable. Given a probability measureQon (E,G) a policy is called Bayesian optimal with respect toQif ∫

E

`(π,ν)dQ(ν) = infπ′∈Π

∫

E

`(π′,ν)dQ(ν).

Prove the following:

Bandit Algorithms

∑

∫

Get our desktop app

Company

Features

Documentation

Resources