Bandit Algorithms

35.7 Bibliographical remarks 426

9 The algorithm in Section 35.5 for computing Gittins index is calledVaraiya’s algorithm. In the bibliographic remarks we give some pointers on where to look for more sophisticated methods. The assumption that|S|is finite is less severe than it may appear. When the discount rate is not too close to 1, then for many problems the Gittins index can be approximated by removing states that are not reachable from the start state before the discounting means they becomes close to irrelevant. When the state space is infinite there is often a topological structure that makes a discretization possible. 10 We met the notion ofsufficient statisticthroughout the book multiple times in connection to exponential families. However, the concept is more general: In the context of mathematical statistics,Yis a sufficient statistic forXgiven a family of distributions (Pθ)θ∈Θover the probability space carrying bothXand Yif(i)Yisσ(X)-measurable (in other words,Yis a measurable function of X, also known as a ‘statistic’ givenX) and(ii)for allθ∈Θ, the conditional distribution ofXgivenYunderPθ,Pθ(X∈·|Y) is independent of the value of θ. Formally,(ii)means the existence of a probability kernelP= (P(y,·))yfrom YtoX(X∈X,Y∈Y) such that for anyθ∈Θ,Pθ(X∈·|Y) =P(Y,·) holds Pθ-almost surely. Informally, this means, that givenY, there is no information left aboutθinX. Denoting byPX,θthe distribution ofX underPθand without loss of generality lettingY=y(X) for somey:X → Ymeasurable map (recall Lemma 2.5), and assuming thatX,Ytake values in Borel spaces, with the help of the so-called disintegration theorem, it is not hard to see that if (PX,θ)θhave a common dominatingσ-finite measureμand for any θ∈Θ,dPdμX,θ(x) =h(x)gθ(y(x)) holdsμ-almost surely for allx∈Xfor some someh:X →[0,∞) andgθ:Y →[0,∞) Borel measurable maps thenY is a sufficient statistic forX. TheFisher-Neyman factorization theorem states that the converse also holds. Now, with some creative matching of concepts, we can see that in single-parameter exponential families what we called a sufficient statistic does indeed satisfy the more general definition.

35.7 Bibliographical remarks

The classic text on optimal stopping is by Robbins et al. [1971], while a more modern text is by Peskir and Shiryaev [2006], which includes a proof of Theorem 35.2 (see Theorem 1.2). With a little extra work you can also extract the proof of Theorem 35.3 from Section 1.2 of that book. We are not aware of a reference for Theorem 35.1, but Lai [1987] has shown that for sufficiently regular priors and noise models the asymptotic Bayesian optimal regret isBR∗n∼clog(n)^2 for some constantc >0 that depends on the prior/model (see Theorem 3 of Lai [1987]). The Bayesian approach dominated research on bandits from 1960–1980, with Gittins’ result (Theorem 35.9) receiving the most attention [Gittins, 1979]. Gittins et al. [2011] has written a whole book on Bayesian bandits. Another book that focusses mostly on the Bayesian problem is by Berry and Fristedt [1985].

Bandit Algorithms

35.7 Bibliographical remarks

Get our desktop app

Company

Features

Documentation

Resources