Pattern Recognition and Machine Learning

1.2. Probability Theory 27

function can be written in the form

lnp

( x|μ, σ^2

) =−

1

2 σ^2

∑N

n=1

(xn−μ)^2 −

N

2

lnσ^2 −

N

2

ln(2π). (1.54)

Maximizing (1.54) with respect toμ, we obtain the maximum likelihood solution
Exercise 1.11 given by

μML=

1

N

∑N

n=1

xn (1.55)

which is thesample mean, i.e., the mean of the observed values{xn}. Similarly, maximizing (1.54) with respect toσ^2 , we obtain the maximum likelihood solution for the variance in the form

σ^2 ML=

1

N

∑N

n=1

(xn−μML)^2 (1.56)

which is thesample variancemeasured with respect to the sample meanμML. Note
that we are performing a joint maximization of (1.54) with respect toμandσ^2 ,but
in the case of the Gaussian distribution the solution forμdecouples from that forσ^2
so that we can first evaluate (1.55) and then subsequently use this result to evaluate
(1.56).
Later in this chapter, and also in subsequent chapters, we shall highlight the sig-
nificant limitations of the maximum likelihood approach. Here we give an indication
of the problem in the context of our solutions for the maximum likelihood param-
eter settings for the univariate Gaussian distribution. In particular, we shall show
that the maximum likelihood approach systematically underestimates the variance
of the distribution. This is an example of a phenomenon calledbiasand is related
Section 1.1 to the problem of over-fitting encountered in the context of polynomial curve fitting.
We first note that the maximum likelihood solutionsμMLandσ^2 MLare functions of
the data set valuesx 1 ,...,xN. Consider the expectations of these quantities with
respect to the data set values, which themselves come from a Gaussian distribution
Exercise 1.12 with parametersμandσ^2. It is straightforward to show that

E[μML]=μ (1.57)

E[σ^2 ML]=

( N− 1 N

) σ^2 (1.58)

so that on average the maximum likelihood estimate will obtain the correct mean but will underestimate the true variance by a factor(N−1)/N. The intuition behind this result is given by Figure 1.15. From (1.58) it follows that the following estimate for the variance parameter is unbiased

̃σ^2 =

N

N− 1

σ^2 ML=

1

N− 1

∑N

n=1

(xn−μML)^2. (1.59)

Pattern Recognition and Machine Learning

1

N

2

N

2

1

N

1

N

N

N− 1

1

N− 1

Get our desktop app

Company

Features

Documentation

Resources