Bandit Algorithms

3.2 Markov chains 47

3.2 Markov chains

A Markov chain is an infinite sequence of random elementsX 1 ,X 2 ,...where the conditional distribution ofXt+1givenX 1 ,...,Xtis the same as the conditional distribution ofXt+1givenXt. The sequence has the property that given the last element, the history is irrelevant to ‘predict’ the future. Such random sequences appear throughout probability theory and have many applications besides. The theory is too rich to explain in detail, so we give the basics and point towards the literature for more details at the end. The focus here is mostly on the definition and existence of Markov chains. Let (X,F) and (Y,G) be measurable spaces. Aprobability kernelorMarkov kernelbetween (X,F) and (Y,G) is a functionK:X ×G →[0,1] such that: (a)K(x,·) is a measure for allx∈X. (b)K(·,A) isF-measurable for allA∈G. The idea here is thatKdescribes a stochastic transition. Having arrived atx, a process’s next state is sampledY∼K(x,·). Occasionally one sees the notation Kx(A) orK(A|x) rather thanK(x,A). IfK 1 is a (X,F)→(Y,G) probability kernel andK 2 is a (Y,G)→(Z,H) probability kernel, then theproduct kernelK 1 ⊗K 2 is the probability kernel from (X,F)→(Y ×Z,G×H) defined by

(K 1 ⊗K 2 )(x,A) =

∫

Y

∫

Z

IA((y,z))K 2 (y,dz)K 1 (x,dy).

WhenPis a measure on (X,F) andKis a kernel fromXtoY, thenP⊗Kis a measure on (X ×Y,F ×G) defined by

(P⊗K)(A) =

∫

X

∫

Y

IA((x,y))K(x,dy)dP(x).

There operations can be composed. WhenPis a probability measure onXand K 1 a kernel fromXtoYandK 2 a kernel fromX ×YtoZ, thenP⊗K 1 ⊗K 2 is a probability measure onX ×Y ×Z. The following provides a counterpart of Theorem 3.2. Theorem3.3 (Ionescu-Tulcea).Let(Ωn,Fn)∞n=1be a sequence of measurable spaces and K 1 be a probability measure on(Ω 1 ,F 1 )and forn≥ 2 letKn be a probability kernel from

∏n− 1 t=1Ωt toΩn. Then there exists a probability space(Ω,F,P)and random elements(Xt)∞t=1 withXt : Ω →Ωt such that PX 1 ,...,Xn=

⊗n t=1Ktfor alln∈N

+.

Ahomogeneous Markov chainis a sequence of random elements (Xt)∞t=1 taking values instate spaceS= (X,F) and with P(Xt+1∈·|X 1 ,...,Xt) =P(Xt+1∈·|Xt) =μ(Xt,·) almost surely, whereμis a probability kernel from (X,F) to (X,F) and we assume that P(X 1 ∈·) =μ 0 (·) for some measureμ 0 on (X,F).

Bandit Algorithms

3.2 Markov chains

∫

∫

∫

∫

+.

Get our desktop app

Company

Features

Documentation

Resources