Pattern Recognition and Machine Learning

(Jeff_L) #1
528 11. SAMPLING METHODS

y 1 = z 1

(
−2lnz 1
r^2

) 1 / 2
(11.10)

y 2 = z 2

(
−2lnz 2
r^2

) 1 / 2
(11.11)

Exercise 11.4 wherer^2 =z 12 +z^22. Then the joint distribution ofy 1 andy 2 is given by


p(y 1 ,y 2 )=p(z 1 ,z 2 )





∂(z 1 ,z 2 )
∂(y 1 ,y 2 )





=

[
1

2 π

exp(−y^21 /2)

][
1

2 π

exp(−y 22 /2)

]
(11.12)

and soy 1 andy 2 are independent and each has a Gaussian distribution with zero
mean and unit variance.
Ifyhas a Gaussian distribution with zero mean and unit variance, thenσy+μ
will have a Gaussian distribution with meanμand varianceσ^2. To generate vector-
valued variables having a multivariate Gaussian distribution with meanμand co-
varianceΣ, we can make use of theCholesky decomposition, which takes the form
Σ=LLT(Presset al., 1992). Then, ifzis a vector valued random variable whose
components are independent and Gaussian distributed with zero mean and unit vari-
Exercise 11.5 ance, theny=μ+Lzwill have meanμand covarianceΣ.
Obviously, the transformation technique depends for its success on the ability
to calculate and then invert the indefinite integral of the required distribution. Such
operations will only be feasible for a limited number of simple distributions, and so
we must turn to alternative approaches in search of a more general strategy. Here
we consider two techniques calledrejection samplingandimportance sampling.Al-
though mainly limited to univariate distributions and thus not directly applicable to
complex problems in many dimensions, they do form important components in more
general strategies.


11.1.2 Rejection sampling


The rejection sampling framework allows us to sample from relatively complex
distributions, subject to certain constraints. We begin by considering univariate dis-
tributions and discuss the extension to multiple dimensions subsequently.
Suppose we wish to sample from a distributionp(z)that is not one of the simple,
standard distributions considered so far, and that sampling directly fromp(z)is dif-
ficult. Furthermore suppose, as is often the case, that we are easily able to evaluate
p(z)for any given value ofz, up to some normalizing constantZ, so that

p(z)=

1

Zp

̃p(z) (11.13)

where ̃p(z)can readily be evaluated, butZpis unknown.
In order to apply rejection sampling, we need some simpler distributionq(z),
sometimes called aproposal distribution, from which we can readily draw samples.
Free download pdf