Bandit Algorithms

(Jeff_L) #1
10.5 Exercises 137

(f)Find a choice ofhandTsuch that{Pθ:θ∈Θ}is the family of Bernoulli
distributions.
(g)Find a choice ofhandTsuch that{Pθ:θ∈Θ}is the family of Gaussian
distributions with unit variance means inR.


Exponential families represent a wide range of statistical models. We discuss
them in more detail in Chapter 34. The functiond(θ,θ′) is called therelative
entropybetweenPθandPθ′. We discuss this concept more in Chapter 14.

10.5(KL-UCB for exponential families) LetM={Pθ:θ∈Θ}be a
regular non-singular exponential family with sufficient statisticT(x) =xand
E={(Pθi)ki=1:θ∈Θk}be the set of bandits with reward distributions inM.
Design a policyπsuch that for allν∈Eit holds that


nlim→∞

Rn(π,ν)
log(n)



i:∆i> 0

∆i
di,inf

,


whereμ(θ) =



RxdPθ(x) is the mean ofPθanddi,inf=inf{d(θ,φ) :μ(φ)>
μ∗,φ∈Φ}.

Hint Repeat the proof of Theorem 10.6, adapting as necessary. Working through
Exercise 10.4 will probably be useful. See also the paper by Capp ́e et al. [2013].

10.6(KL-UCB for non-canonical exponential families) Repeat the
previous exercise, but relax the assumption thatT(x) =x.


Hint This is a subtle problem. You should adapt the algorithm so that if there
are ties in the upper confidence bounds, then an arm with the largest number of
plays is chosen. A solution is available. Korda et al. [2013] analyzed Thompson
sampling in this setting. Their result only holds whenθ7→


Rxpθ(x)dh(x) is
invertible, which does not always hold.

In the analysis of KL-UCB for canonical exponential families the asymptotic
rate is a good indicator of the finite-time regret in the sense that theo(log(n))
term hidden by the asymptotics has roughly the same leading constant as
the dominant term. By contrast, the analysis here indicates that

E[Ti(n)]≈log(n)
di,inf

+^1


di,min

,


wheredi,min=di,min(0). Although the latter term is negligible asymptotically,
it may be the dominant term for all reasonablen.

10.7(Comparison to UCB) In this exercise you compare KL-UCB and UCB
empirically.

Free download pdf