Bandit Algorithms

34.6 Bayesian regret 405

34.6 Bayesian regret

Recall that the regret of policyπink-armed bandit environmentνovernrounds is

Rn(π,ν) =nμ∗−E

[n ∑

t=1

Xt

]

, (34.7)

whereμ∗=maxi∈[k]μiandμiis the mean ofPνi. Given ak-armed Bayesian bandit environment (E,G,Q,P) and a policyπtheBayesian regretis

BRn(π,Q) =

∫

E

Rn(π,ν)dQ(ν).

The dependence onE,G and P is omitted on the grounds that these are always self-evident from the context. The Bayesian optimal regret isBR∗n(Q) = infπBRn(π,Q) and the optimal (regret-minimizing) policy is

π∗= argminπBRn(π,Q). (34.8)

Note that the regret minimizing policy is the same as the reward maximizing policyπ∗=argmaxπEPQPπ[

∑n t=1Xt], which is known as theBayesian optimal policyunder priorQ. In all generality there is no guarantee that the (Bayes) optimal policy exists, but the nonnegativity of the Bayesian regret ensures that for anyε >0 there exists a policyπwith BRn(π,Q)≤BR∗n(Q) +ε.

The fact that the expected regretRn(π,ν) is nonnegative for allνandπ means that the Bayesian regret is always nonnegative. Perhaps less obviously, the Bayesian regret of the Bayesian optimal policy can be strictly greater than zero (Exercise 34.5).

34.7 Notes

1 In Chapter 4 we definedEas a set of tuples of probability distributions over the reals. In a Bayesian bandit environment (E,G,Q,P) the setEis arbitrary. The reward distributions are given by the probability kernelP. The probability kernel is needed because we are now integrating the regret overE, which may not be measurable without additional conditions. 2 The Bayesian regret of an algorithm is less informative than the frequentist regret. By this we mean that a bound onBRn(π,Q) does not generally imply a meaningful bound onRn(π,ν), while ifRn(π,ν)≤f(ν) for a measurable functionf, then BRn(π,Q)≤E[f(ν)]. This is not an argument against using a Bayesian algorithm but rather an argument for the need to analyze the frequentist regret of Bayesian algorithms.

Bandit Algorithms

34.6 Bayesian regret

]

, (34.7)

∫

34.7 Notes

Get our desktop app

Company

Features

Documentation

Resources