Bandit Algorithms

34.3 Conjugate pairs, conjugate priors and the exponential family 401

The collection{Pθ:θ∈Θ}is called asingle parameter exponential family.
We have encountered exponential families already in Exercises 10.4 to 10.6. Now
is a good time to review the first of these exercises.
Example34.3.Letσ^2 >0 andh=N(0,σ^2 ) andη(θ) =θσandT(x) =xσ. An
easy calculation shows thatA(θ) =θ^2 /(2σ^2 ), which has domain Θ =Rand
Pθ=N(θ,σ^2 ).

Example34.4. Leth=δ 0 +δ 1 be the sum of Dirac measures andT(x) =x andη(θ) =θ. ThenA(θ) =log(1 +exp(θ)) and Θ =RandPθ=B(σ(θ)) where σ(θ) = exp(θ)/(1 + exp(θ)) is the logistic function.

Example34.5.The same family can be parameterized in many different ways. Leth=δ 0 +δ 1 andT(x) =xandη(θ) =log(θ/(1−θ)). ThenA(θ) =−log(1−θ) and Θ = (0,1) andPθ=B(θ). Exponential families have many nice properties. Of most interest to us here is the existence of conjugate priors. Suppose that{Pθ:θ∈Θ}is a single parameter exponential family determined byh,ηandTwithT(x) =xassumed to be the identity. Letx 0 ,n 0 ∈Rand define prior measureQon (Θ,B(Θ)) in terms of its densityq=dQ/dλwithλthe Lebesgue measure:

q(θ) =∫ exp (n^0 x^0 η(θ)−n^0 A(θ)) Θexp (n^0 x^0 η(θ)−n^0 A(θ))dθ

, (34.4)

where we assume the existence and strict positivity of the integral in the
denominator. Suppose we observeX=x. Then a choice of posterior has density
with respect to the Lebesgue measure given by

q(θ|x) = ∫ exp (η(θ)(x+n^0 x^0 )−(1 +n^0 )A(θ)) Θexp (η(θ)(x+n^0 x^0 )−(1 +n^0 )A(θ))dλ(θ)

.

What this means is that after observing the valuex, the posterior takes the form
of the prior except that the parameters (x 0 ,n 0 ) associated with the prior get
updated to ((n 0 x 0 +x)/(n 0 + 1),n 0 + 1). The posterior is both easy to represent
and maintain. To see how exponential families recover previous examples, consider
the Bernoulli case of Example 34.5. Since

exp(n 0 x 0 η(θ)−n 0 A(θ)) =

(

θ 1 −θ

)n 0 x 0 (1−θ)n^0 =θn^0 x^0 (1−θ)n^0 (1−x^0 ),

we see that the prior from (34.4) is a beta distribution with parameters
α= 1 +n 0 x 0 andβ= 1 +n 0 (1−x 0 ) as can be seen from(34.3). As expected,
the posterior update also works as described earlier.

There are important parametric families with conjugate priors that are not exponential families. One example is the uniform family{U(a,b) :a < b}, which is conjugate to the Pareto family.

Bandit Algorithms

, (34.4)

.

(

Get our desktop app

Company

Features

Documentation

Resources