Bandit Algorithms

34.3 Conjugate pairs, conjugate priors and the exponential family 400

no amount of data can change their belief. On the other hand, asσP^2 tends to
infinity we see the mean of the posterior has no dependence on the prior mean,
which means that all prior knowledge is washed away with just one sample. You
should think about what happens whenσ^2 S→{ 0 ,∞}.
Notice how the model has fixedσ^2 S, suggesting that the model variance is
known. The Bayesian can also incorporate their uncertainty over the variance. In
this case the model parameters are Θ =R×[0,∞) andPΘ=N(θ 1 ,θ 2 ). But is
there a conjugate prior in this case? Already things are getting complicated, so we
will simply let you know that the family of Gaussian-inverse-gamma distributions
is conjugate.

Bernoul li model/beta prior Suppose that Θ = [0,1] andPθ=B(θ) is Bernoulli with parameterθ. In this case it turns out that the family of beta distributions is conjugate, which for parametersθ= (α,β)∈(0,∞)^2 is given in terms of its probability density function with respect to the Lebesgue measure:

pα,β(x) =xα−^1 (1−x)β−^1

Γ(α+β) Γ(α)Γ(β)

, (34.3)

where Γ(x) is the Gamma function. Then the posterior having observedX=x∈
{ 0 , 1 }is also a beta distribution with parameters (α+x,β+ 1−x).

Here and in what follows, in line with the literature, we sweep under the rug that this posterior is just one of the many choices. This is done to simplify the language, which is justified by that all posteriors must agree almost everywhere and thus the slight imprecision will hopefully not lead to confusion.

Unlike in the Gaussian case, the posterior for the Bernoulli model and beta prior is unique (Exercise 34.2).

Exponential families
Both the Gaussian and Bernoulli families are examples of a more general family.
Lethbe a measure on (R,B(R)) andT,η:R→Rbe two ‘suitable’ functions,
whereTis called thesufficient statistic. Together,h,ηandTdefine a measure
Pθon (R,B(R)) for eachθ∈Θ⊂Rin terms of its density with respect toh:
dPθ
dh

(x) = exp (η(θ)T(x)−A(θ)),

where A(θ) = log

∫

Rexp(η(θ)T(x))dh(x) is the log-partitionfunction and
Θ =dom(A) ={θ:A(θ)<∞}is the domain ofA. Integrating the density
shows that for anyB∈B(R) andθ∈Θ,

Pθ(B) =

∫

B

dPθ dh(x)dh(x) =

∫

B

exp (η(θ)T(x)−A(θ))dh(x).

Bandit Algorithms

, (34.3)

∫

∫

∫

Get our desktop app

Company

Features

Documentation

Resources