Bandit Algorithms

(Jeff_L) #1
34.8 Bibliographic remarks 407

side is the minimax regretR∗n(E). Choosing appropriate topologies onPand
Qis not always easy. For examples see Exercises 34.14 and 36.11.
6 The issue of conditioning on measure zero sets has been described in many
places. We do not know of a practical situation where things go awry. Sensible
choices yield sensible posteriors. The curious reader could probably burn a few
weeks reading through the literature on theBorel–Kolmogorov paradox
[Jaynes, 2003,§15.7].
7 Suppose that (Pθ:θ∈Θ) is a probability kernel from (Θ,G) to (R,B(R)) for
which there exists measureλon (R,B(R)) such thatPθλfor allθ∈Θ.
Then there exists a family of densitiespθ:R→[0,∞) such thatpθ(x) is jointly
measurable as a function ofθandxandpθ=dPθ/dλfor allθ∈Θ. See the
proof of Lemma 1.2 in [Ghosal and van der Vaart, 2017] or Sections 1.3 and
1.4 of the book by Strasser [2011].

34.8 Bibliographic remarks


There are many texts on Bayesian statistics. For an introduction to the applied
side there is the book by Gelman et al. [2014], which has lots of discussion and
examples. A more philosophical book that takes a foundational look at probability
theory from a Bayesian perspective is by Jaynes [2003]. The careful definition
of the posterior can be found in several places, but the recent book by Ghosal
and van der Vaart [2017] does an impeccable job. A worthy mention goes to the
article by Chang and Pollard [1997], which uses disintegration (Theorem 3.12)
to formalize the “private calculations” that probabilists so frequently make
before writing everything carefully using Radon-Nikodym derivatives and regular
versions. Theorem 34.1 is well known. For a simple proof see Theorem 5.3 in the
book by Kallenberg [2002]. For a detailed presentation of exponential families see
the book by Lehmann and Casella [2006]. A compendium of conjugate priors is
by Fink [1997].

34.9 Exercises


34.1(Posterior calculations) Evaluate the posteriors for each pair of
conjugate priors in Section 34.3.

34.2(Uniqueness of beta/Bernoulli posterior) Explain why the posterior
for the Bernoulli model with a beta prior is unique.

34.3(Posterior in terms of density) LetP= (Pθ:θ∈Θ) be a probability
kernel from (Θ,G) to (X,H) andQbe a probability measure on Θ andP=Q⊗P
on Θ×U. As usual, letθandXbe the coordinate projections on Θ×X. Let
νandμbe probability measures on (Θ,G) and (X,H) such thatQνand
Free download pdf