Sampling 287
7.3.2 Importance sampling
Let us return to the problem of calculating the multidimensional integral
I=
∫
dxφ(x)f(x). (7.3.12)
Instead of sampling the distributionf(x), we could sample a different distributionh(x)
by rewriting the integral as
I=
∫
dx
[
φ(x)f(x)
h(x)
]
h(x) (7.3.13)
and introducingψ(x) =φ(x)f(x)/h(x). When this is done, eqn. (7.2.3) leads to
I=
∫
dxψ(x)h(x) =
1
M
∑M
i=1
ψ(xi)±
1
√
M
[
〈ψ^2 〉h−〈ψ〉^2 h
] 1 / 2
, (7.3.14)
where the vectors xiare sampled from the distributionh(x). The use of the distribution
h(x) in lieu off(x) is known asimportance sampling.
There are several reasons to employ an importance functionh(x) in a Monte Carlo
calculation. First, the functionh(x) might be easier to sample thanf(x). Ifh(x) retains
some of the most important features off(x), thenh(x) will be a good choice for an
importance function. In this sense, employing importance sampling isakin to using
a reference potential in molecular dynamics, which we discussed in Section 3.11. A
second reason concerns the behavior of the integrandφ(x) itself. Ifφ(x) is a highly
oscillatory function, then positive and negative contributions will tend to cancel in the
Monte Carlo evaluation of eqn. (7.3.12), rendering the convergence of the sampling
algorithm extremely slow and inefficient because of the large variance. A judiciously
chosen importance function can help tame such oscillatory behavior, leading to a
smaller variance and better convergence.
We now ask if there is an optimal choice for an importance functionh(x). The best
choice is one that leads to the smallest possible variance. According to eqn. (7.3.14),
the variance, which is a functional ofh(x), is given by
σ^2 [h] =
[∫
dxψ^2 (x)h(x)−
(∫
dxψ(x)h(x)
) 2 ]
=
[∫
dx
φ^2 (x)f^2 (x)
h^2 (x)
h(x)−
(∫
dxφ(x)f(x)
) 2 ]
. (7.3.15)
We seek to minimize this variance with respect to the choice ofh(x) subject to the
constraint thath(x) be properly normalized:
∫
dxh(x) = 1. (7.3.16)
This can be done by introducing a Lagrange multiplier and minimizing the functional