288 Monte Carlo
F[h] =σ^2 [h]−λ∫
dxh(x). (7.3.17)Computing the functional derivativeδF/δh(x), we obtain the condition
φ^2 (x)f^2 (x)
h^2 (x)+λ= 0 (7.3.18)or
h(x) =1
√
−λφ(x)f(x). (7.3.19)The Lagrange multiplier can be determined by requiring thath(x) be normalized so
that ∫
dxh(x) =1
√
−λ∫
dxφ(x)f(x) = 1 (7.3.20)or
√
−λ=−∫
dxφ(x)f(x) =−I. Thus, the optimal choice forh(x) ish(x) =φ(x)f(x)
I. (7.3.21)
In fact, with this choice ofh(x), the variance is identically zero, meaning that a perfect
Monte Carlo algorithm can be constructed based onh(x). Of course, this choice ofh(x)
is only of academic interest because if we knewI, we would not need to perform the
calculation in the first place! However, eqn. (7.3.21) provides a guideline for choosing
h(x) so as to keep the variance low.
As an example, consider the Monte Carlo evaluation of the integral
I=
∫ 1
0dxe−x= 1−1
e= 0. 632120558829. (7.3.22)
The simplest Monte Carlo sampling scheme for this problem consists in sampling
xuniformly on the interval (0,1) (f(x) = 1 forx∈(0,1) and 0 otherwise) and
then evaluating the functionφ(x) = exp(−x). The integrand exp(−x) is shown as
the solid line in Fig. 7.1(a), and the instantaneous value of the estimatorφ(x) =
exp(−x) is shown in Fig. 7.1(b). After 10^6 steps, uniform sampling gives the answer
asI≈ 0. 6322 ± 0 .000181. Now let us attempt to devise an importance functionh(x)
capable of reducing the variance. We might be tempted to try a first-order Taylor
expansion exp(−x)≈ 1 −x, which is shown as the dotted line in Fig. 7.1(a). After
normalization,h(x) becomesh(x) = 2(1−x), and the use of this importance function
gives, after 10^6 steps of sampling,I≈ 0. 6318 ± 0 .000592. Interestingly, this importance
function makes things worse, yielding a larger variance than simple uniform sampling!
The reason for the failure of this importance function is that 1−xis only a good
representation of exp(−x) forxvery close to 0, as Fig. 7.1(a) clearly shows. Over the
full intervalx∈(0,1), however, 1−xdoes not accurately represent exp(−x) and,
therefore, biases the sampling toward regions ofxthat are more unfavorable than
uniform sampling. Consider, next, using a more general linear functionh(x) = 1−ax,
where the parameterais chosen to give a better representation of exp(−x) over the
