Pattern Recognition and Machine Learning

(Jeff_L) #1
554 11. SAMPLING METHODS

good approximation to the true continuous-time dynamics, it is necessary for the
leapfrog integration scaleto be smaller than the shortest length-scale over which
the potential is varying significantly. This is governed by the smallest value ofσi,
which we denote byσmin. Recall that the goal of the leapfrog integration in hybrid
Monte Carlo is to move a substantial distance through phase space to a new state
that is relatively independent of the initial state and still achieve a high probability of
acceptance. In order to achieve this, the leapfrog integration must be continued for a
number of iterations of orderσmax/σmin.
By contrast, consider the behaviour of a simple Metropolis algorithm with an
isotropic Gaussian proposal distribution of variances^2 , considered earlier. In order
to avoid high rejection rates, the value ofsmust be of orderσmin. The exploration of
state space then proceeds by a random walk and takes of order(σmax/σmin)^2 steps
to arrive at a roughly independent state.

11.6 Estimating the Partition Function


As we have seen, most of the sampling algorithms considered in this chapter re-
quire only the functional form of the probability distribution up to a multiplicative
constant. Thus if we write

pE(z)=

1

ZE

exp(−E(z)) (11.71)

then the value of the normalization constantZE, also known as the partition func-
tion, is not needed in order to draw samples fromp(z). However, knowledge of the
value ofZEcan be useful for Bayesian model comparison since it represents the
model evidence (i.e., the probability of the observed data given the model), and so
it is of interest to consider how its value might be obtained. We assume that direct
evaluation by summing, or integrating, the functionexp(−E(z))over the state space
ofzis intractable.
For model comparison, it is actually the ratio of the partition functions for two
models that is required. Multiplication of this ratio by the ratio of prior probabilities
gives the ratio of posterior probabilities, which can then be used for model selection
or model averaging.
One way to estimate a ratio of partition functions is to use importance sampling
from a distribution with energy functionG(z)

ZE
ZG

=


∑zexp(−E(z))
zexp(−G(z))

=


zexp(−∑E(z)+G(z)) exp(−G(z))
zexp(−G(z))
= EG(z)[exp(−E+G)]



l

exp(−E(z(l))+G(z(l))) (11.72)
Free download pdf