Pattern Recognition and Machine Learning

(Jeff_L) #1
540 11. SAMPLING METHODS

conditional probabilities for subsequent variables in the form oftransition probabil-
itiesTm(z(m),z(m+1))≡p(z(m+1)|z(m)). A Markov chain is calledhomogeneous
if the transition probabilities are the same for allm.
The marginal probability for a particular variable can be expressed in terms of
the marginal probability for the previous variable in the chain in the form

p(z(m+1))=


z(m)

p(z(m+1)|z(m))p(z(m)). (11.38)

A distribution is said to be invariant, or stationary, with respect to a Markov chain
if each step in the chain leaves that distribution invariant. Thus, for a homogeneous
Markov chain with transition probabilitiesT(z′,z), the distributionp(z)is invariant
if
p(z)=


z′

T(z′,z)p(z′). (11.39)

Note that a given Markov chain may have more than one invariant distribution. For
instance, if the transition probabilities are given by the identity transformation, then
any distribution will be invariant.
A sufficient (but not necessary) condition for ensuring that the required distribu-
tionp(z)is invariant is to choose the transition probabilities to satisfy the property
ofdetailed balance, defined by

p(z)T(z,z′)=p(z′)T(z′,z) (11.40)

for the particular distributionp(z). It is easily seen that a transition probability
that satisfies detailed balance with respect to a particular distribution will leave that
distribution invariant, because

z′

p(z′)T(z′,z)=


z′

p(z)T(z,z′)=p(z)


z′

p(z′|z)=p(z). (11.41)

A Markov chain that respects detailed balance is said to bereversible.
Our goal is to use Markov chains to sample from a given distribution. We can
achieve this if we set up a Markov chain such that the desired distribution is invariant.
However, we must also require that form→∞, the distributionp(z(m))converges
to the required invariant distributionp(z), irrespective of the choice of initial dis-
tributionp(z(0)). This property is calledergodicity, and the invariant distribution
is then called theequilibriumdistribution. Clearly, an ergodic Markov chain can
have only one equilibrium distribution. It can be shown that a homogeneous Markov
chain will be ergodic, subject only to weak restrictions on the invariant distribution
and the transition probabilities (Neal, 1993).
In practice we often construct the transition probabilities from a set of ‘base’
transitionsB 1 ,...,BK. This can be achieved through a mixture distribution of the
form

T(z′,z)=

∑K

k=1

αkBk(z′,z) (11.42)
Free download pdf