Pattern Recognition and Machine Learning

2.3. The Gaussian Distribution 99

Figure 2.12 Illustration of Bayesian inference for the meanμof a Gaussian distribution, in which the variance is assumed to be known. The curves show the prior distribution overμ (the curve labelledN =0), which in this case is itself Gaussian, along with the posterior distribution given by (2.140) for increasing numbersN of data points. The data points are generated from a Gaussian of mean 0. 8 and variance 0. 1 , and the prior is chosen to have mean 0. In both the prior and the likelihood function, the variance is set to the true value.

N=0

N=1

N=2

N=10

−1 0 1

0

5

We illustrate our analysis of Bayesian inference for the mean of a Gaussian
distribution in Figure 2.12. The generalization of this result to the case of aD-
dimensional Gaussian random variablexwith known covariance and unknown mean
Exercise 2.40 is straightforward.
We have already seen how the maximum likelihood expression for the mean of
Section 2.3.5 a Gaussian can be re-cast as a sequential update formula in which the mean after
observingNdata points was expressed in terms of the mean after observingN− 1
data points together with the contribution from data pointxN. In fact, the Bayesian
paradigm leads very naturally to a sequential view of the inference problem. To see
this in the context of the inference of the mean of a Gaussian, we write the posterior
distribution with the contribution from the final data pointxNseparated out so that

p(μ|D)∝

[

p(μ)

N∏− 1

n=1

p(xn|μ)

]

p(xN|μ). (2.144)

The term in square brackets is (up to a normalization coefficient) just the posterior distribution after observingN− 1 data points. We see that this can be viewed as a prior distribution, which is combined using Bayes’ theorem with the likelihood function associated with data pointxNto arrive at the posterior distribution after observingNdata points. This sequential view of Bayesian inference is very general and applies to any problem in which the observed data are assumed to be independent and identically distributed. So far, we have assumed that the variance of the Gaussian distribution over the data is known and our goal is to infer the mean. Now let us suppose that the mean is known and we wish to infer the variance. Again, our calculations will be greatly simplified if we choose a conjugate form for the prior distribution. It turns out to be most convenient to work with the precisionλ≡ 1 /σ^2. The likelihood function forλ takes the form

p(X|λ)=

∏N

n=1

N(xn|μ, λ−^1 )∝λN/^2 exp

{ −

λ 2

∑N

n=1

(xn−μ)^2

}

. (2.145)

Pattern Recognition and Machine Learning

Get our desktop app

Company

Features

Documentation

Resources