Bandit Algorithms

10.5 Exercises 137

(f)Find a choice ofhandTsuch that{Pθ:θ∈Θ}is the family of Bernoulli
distributions.
(g)Find a choice ofhandTsuch that{Pθ:θ∈Θ}is the family of Gaussian
distributions with unit variance means inR.

Exponential families represent a wide range of statistical models. We discuss them in more detail in Chapter 34. The functiond(θ,θ′) is called therelative entropybetweenPθandPθ′. We discuss this concept more in Chapter 14.

10.5(KL-UCB for exponential families) LetM={Pθ:θ∈Θ}be a
regular non-singular exponential family with sufficient statisticT(x) =xand
E={(Pθi)ki=1:θ∈Θk}be the set of bandits with reward distributions inM.
Design a policyπsuch that for allν∈Eit holds that

nlim→∞

Rn(π,ν) log(n)

≤

∑

i:∆i> 0

∆i di,inf

,

whereμ(θ) =

∫

RxdPθ(x) is the mean ofPθanddi,inf=inf{d(θ,φ) :μ(φ)> μ∗,φ∈Φ}.

Hint Repeat the proof of Theorem 10.6, adapting as necessary. Working through Exercise 10.4 will probably be useful. See also the paper by Capp ́e et al. [2013].

10.6(KL-UCB for non-canonical exponential families) Repeat the
previous exercise, but relax the assumption thatT(x) =x.

Hint This is a subtle problem. You should adapt the algorithm so that if there are ties in the upper confidence bounds, then an arm with the largest number of plays is chosen. A solution is available. Korda et al. [2013] analyzed Thompson sampling in this setting. Their result only holds whenθ7→

∫

Rxpθ(x)dh(x) is invertible, which does not always hold.

In the analysis of KL-UCB for canonical exponential families the asymptotic rate is a good indicator of the finite-time regret in the sense that theo(log(n)) term hidden by the asymptotics has roughly the same leading constant as the dominant term. By contrast, the analysis here indicates that

E[Ti(n)]≈log(n) di,inf

+^1

di,min

,

wheredi,min=di,min(0). Although the latter term is negligible asymptotically, it may be the dominant term for all reasonablen.

10.7(Comparison to UCB) In this exercise you compare KL-UCB and UCB
empirically.

Bandit Algorithms

≤

∑

,

∫

∫

+^1

,

Get our desktop app

Company

Features

Documentation

Resources