Bandit Algorithms

34.3 Conjugate pairs, conjugate priors and the exponential family 399

{X∈C}has measure zero and there is little cause to worry about events that happen with probability zero. But for a frequentist using Bayesian techniques for inference this actually matters. Ifθis not sampled fromQ, then nothing prevents the situation thatθ∈Cand the nonuniqueness of the posterior is an issue (Exercise 34.10). Probability theory does not provide a way around this issue.

One should be careful to specify the version of the posterior being used when using Bayesian techniques for inference in a frequentist setting. This is important because in the frequentist viewpointθis not part of the probability space and results are proven forPθfor arbitrary fixedθ∈Θ. By contrast, the all-in Bayesians includeθin the probability space and thus will not worry about events with negligible prior probability and for them any version of the posterior will do.

34.3 Conjugate pairs, conjugate priors and the exponential family

One of the strengths of the Bayesian approach is the ability to explicitly specify and incorporate prior beliefs into the uncertainty models in a natural way via the prior. When it comes to Bayesian algorithms, this advantage is belied a little by the competing necessity of choosing a prior for which the posterior can be efficiently computed, or sampled from. The ease of computing (sampling from) the posterior depends on the interplay between the prior and the model. Given the importance of computation, it is hardly surprising that researchers have worked hard to find models and priors that behave well together. A prior and model are called aconjugate pairif the posterior has the same parametric form as the prior. In this case, the prior is called aconjugate priorto the model.

Gaussian model/Gaussian prior Suppose that (Θ,G) = (Ω,F) = (R,B(R)) andX: Ω→Ω is the identity and Pθis Gaussian with meanθand knownsignal varianceσ^2 S. If the priorQis Gaussian with meanμPandprior varianceσ^2 P, then the posterior distribution having observedX=xcan be chosen to be

Q(·|x) =N

(

μP/σP^2 +x/σ^2 S 1 /σ^2 P+ 1/σS^2

,

(

1

σ^2 S

+^1

σ^2 P

)− 1 )

.

The proof is left to the reader in Exercise 34.1. The limiting regimes as the prior/signal variance tend to zero or infinity are quite illuminating. For example, asσ^2 P→0 the posterior tends to a GaussianN(μP,σ^2 P), which is equal to the prior and indicates that no learning occurs. This is consistent with intuition. If the prior variance is zero, then the statistician is already certain of the mean and

Bandit Algorithms

34.3 Conjugate pairs, conjugate priors and the exponential family

(

,

(

1

+^1

)− 1 )

.

Get our desktop app

Company

Features

Documentation

Resources