Bandit Algorithms

34.3 Conjugate pairs, conjugate priors and the exponential family 402

34.3.1 Sequences of random variables and the Markov chain view

LetP = (Pθ:θ∈Θ) be a Markov kernel from (Θ,G) to (X,H). Then let Pn= (Pθn:θ∈Θ) be the unique Markov kernel from (Θ,G)→(Xn,Hn) defined on rectangles by

Pθn(B 1 ×···×Bn) =

∏n

t=1

Pθ(Bt).

Given a priorQon (Θ,G), letθ∈Θ be a random element andX 1 ,X 2 ,...,Xn be a sequence of random elements on some probability space (Ω,F,P) such that

(a)P(θ∈·) =Q(·). (b)P(X 1 ,...,Xn∈B 1 ×···×Bn|θ) =Pθn(B 1 ×···×Bn) almost surely for all B 1 ,...,Bn∈H.

In words, givenθwhose distribution isQ,X 1 ,...,Xnare independent and share the common distributionPθ. The posterior aftertobservations is a Markov kernel Qtfrom (Xt,Ht) to (Θ,G) such that

E[I{θ∈B}|X 1 ,...,Xt] =Qt(B|X 1 ,...,Xt) almost surely.

Then the conditional distribution ofXt+1givenX 1 ,...,Xtis

P(Xt+1∈B|X 1 ,...,Xt) =

∫

Θ

Pθ(B)Qt(dθ|X 1 ,...,Xt).

This means the conditional distribution ofXt+1can be written in terms of the model and posterior. In many cases the posterior has a simple form that makes this view rather convenient. The sequence of measuresQ 0 ,Q 1 ,...,Qncan be viewed as a Markov chain on the spaces of measures on (Θ,G).

Example34.6. Suppose Θ = [0,1] andG =B([0,1]) andQ=Beta(α,β) andPθ=B(θ) is Bernoulli. Then the posterior aftertobservations isQt= Beta(α+St,β+t−St) whereSt=

∑t s=1Xs. Furthermore,E[Xt+1|X^1 ,...,Xt] = EQt[Xt+1] = (α+St)/(α+β+t) and hence

P(St+1=St+ 1|St) =

α+St α+β+t P(St+1=St|St) = 1−P(St+1=St+ 1).

So the posterior aftertobservations is a Beta distribution depending onStand S 1 ,S 2 ,...,Snfollows a Markov chain evolving according to the above display.

Example34.7. Let (Θ,G) = (R,B(R)) andQ=N(μ,σ^2 ) andPθ=N(θ,1). Then using the same notation as above the posterior is almost surelyQt= N(μt,σ^2 t) where

μt=

μ/σ^2 +St 1 /σ^2 +t

σ^2 t=

(

1

σ^2

+t

)− 1

Bandit Algorithms

∫

(

1

)− 1

Get our desktop app

Company

Features

Documentation

Resources