35.7 Bibliographical remarks 426
9 The algorithm in Section 35.5 for computing Gittins index is calledVaraiya’s
algorithm. In the bibliographic remarks we give some pointers on where to
look for more sophisticated methods. The assumption that|S|is finite is less
severe than it may appear. When the discount rate is not too close to 1, then
for many problems the Gittins index can be approximated by removing states
that are not reachable from the start state before the discounting means they
becomes close to irrelevant. When the state space is infinite there is often a
topological structure that makes a discretization possible.
10 We met the notion ofsufficient statisticthroughout the book multiple times
in connection to exponential families. However, the concept is more general: In
the context of mathematical statistics,Yis a sufficient statistic forXgiven a
family of distributions (Pθ)θ∈Θover the probability space carrying bothXand
Yif(i)Yisσ(X)-measurable (in other words,Yis a measurable function of
X, also known as a ‘statistic’ givenX) and(ii)for allθ∈Θ, the conditional
distribution ofXgivenYunderPθ,Pθ(X∈·|Y) is independent of the value of
θ. Formally,(ii)means the existence of a probability kernelP= (P(y,·))yfrom
YtoX(X∈X,Y∈Y) such that for anyθ∈Θ,Pθ(X∈·|Y) =P(Y,·) holds
Pθ-almost surely. Informally, this means, that givenY, there is no information
left aboutθinX. Denoting byPX,θthe distribution ofX underPθand
without loss of generality lettingY=y(X) for somey:X → Ymeasurable
map (recall Lemma 2.5), and assuming thatX,Ytake values in Borel spaces,
with the help of the so-called disintegration theorem, it is not hard to see
that if (PX,θ)θhave a common dominatingσ-finite measureμand for any
θ∈Θ,dPdμX,θ(x) =h(x)gθ(y(x)) holdsμ-almost surely for allx∈Xfor some
someh:X →[0,∞) andgθ:Y →[0,∞) Borel measurable maps thenY
is a sufficient statistic forX. TheFisher-Neyman factorization theorem
states that the converse also holds. Now, with some creative matching of
concepts, we can see that in single-parameter exponential families what we
called a sufficient statistic does indeed satisfy the more general definition.
35.7 Bibliographical remarks
The classic text on optimal stopping is by Robbins et al. [1971], while a
more modern text is by Peskir and Shiryaev [2006], which includes a proof
of Theorem 35.2 (see Theorem 1.2). With a little extra work you can also extract
the proof of Theorem 35.3 from Section 1.2 of that book. We are not aware of a
reference for Theorem 35.1, but Lai [1987] has shown that for sufficiently regular
priors and noise models the asymptotic Bayesian optimal regret isBR∗n∼clog(n)^2
for some constantc >0 that depends on the prior/model (see Theorem 3 of Lai
[1987]). The Bayesian approach dominated research on bandits from 1960–1980,
with Gittins’ result (Theorem 35.9) receiving the most attention [Gittins, 1979].
Gittins et al. [2011] has written a whole book on Bayesian bandits. Another book
that focusses mostly on the Bayesian problem is by Berry and Fristedt [1985].