Bandit Algorithms

(Jeff_L) #1
34.3 Conjugate pairs, conjugate priors and the exponential family 402

34.3.1 Sequences of random variables and the Markov chain view


LetP = (Pθ:θ∈Θ) be a Markov kernel from (Θ,G) to (X,H). Then let
Pn= (Pθn:θ∈Θ) be the unique Markov kernel from (Θ,G)→(Xn,Hn) defined
on rectangles by

Pθn(B 1 ×···×Bn) =

∏n

t=1

Pθ(Bt).

Given a priorQon (Θ,G), letθ∈Θ be a random element andX 1 ,X 2 ,...,Xn
be a sequence of random elements on some probability space (Ω,F,P) such that

(a)P(θ∈·) =Q(·).
(b)P(X 1 ,...,Xn∈B 1 ×···×Bn|θ) =Pθn(B 1 ×···×Bn) almost surely for all
B 1 ,...,Bn∈H.

In words, givenθwhose distribution isQ,X 1 ,...,Xnare independent and share
the common distributionPθ. The posterior aftertobservations is a Markov kernel
Qtfrom (Xt,Ht) to (Θ,G) such that

E[I{θ∈B}|X 1 ,...,Xt] =Qt(B|X 1 ,...,Xt) almost surely.

Then the conditional distribution ofXt+1givenX 1 ,...,Xtis

P(Xt+1∈B|X 1 ,...,Xt) =


Θ

Pθ(B)Qt(dθ|X 1 ,...,Xt).

This means the conditional distribution ofXt+1can be written in terms of the
model and posterior. In many cases the posterior has a simple form that makes
this view rather convenient. The sequence of measuresQ 0 ,Q 1 ,...,Qncan be
viewed as a Markov chain on the spaces of measures on (Θ,G).

Example34.6. Suppose Θ = [0,1] andG =B([0,1]) andQ=Beta(α,β)
andPθ=B(θ) is Bernoulli. Then the posterior aftertobservations isQt=
Beta(α+St,β+t−St) whereSt=

∑t
s=1Xs. Furthermore,E[Xt+1|X^1 ,...,Xt] =
EQt[Xt+1] = (α+St)/(α+β+t) and hence

P(St+1=St+ 1|St) =

α+St
α+β+t
P(St+1=St|St) = 1−P(St+1=St+ 1).

So the posterior aftertobservations is a Beta distribution depending onStand
S 1 ,S 2 ,...,Snfollows a Markov chain evolving according to the above display.

Example34.7. Let (Θ,G) = (R,B(R)) andQ=N(μ,σ^2 ) andPθ=N(θ,1).
Then using the same notation as above the posterior is almost surelyQt=
N(μt,σ^2 t) where

μt=

μ/σ^2 +St
1 /σ^2 +t

σ^2 t=

(


1


σ^2

+t

)− 1

Free download pdf