Pattern Recognition and Machine Learning

2.3. The Gaussian Distribution 103

Figure 2.15 Plot of Student’s t-distribution (2.159) forμ=0andλ=1for various values ofν. The limitν→∞corresponds to a Gaussian distribution with mean μand precisionλ.

ν→∞ ν=1. 0 ν=0. 1

− 5 0 5

0

0.1

0.2

0.3

0.4

0.5

p(x|μ, a, b)=

∫∞

0

N(x|μ, τ−^1 )Gam(τ|a, b)dτ (2.158)

=

∫∞

0

bae(−bτ)τa−^1 Γ(a)

(τ

2 π

) 1 / 2 exp

{ −

τ 2

(x−μ)^2

} dτ

=

ba Γ(a)

( 1 2 π

) 1 / 2 [ b+

(x−μ)^2 2

]−a− 1 / 2 Γ(a+1/2)

where we have made the change of variablez=τ[b+(x−μ)^2 /2]. By convention we define new parameters given byν =2aandλ=a/b, in terms of which the distributionp(x|μ, a, b)takes the form

St(x|μ, λ, ν)=

Γ(ν/2+1/2) Γ(ν/2)

( λ πν

) 1 / 2 [ 1+

λ(x−μ)^2 ν

]−ν/ 2 − 1 / 2 (2.159)

which is known asStudent’s t-distribution. The parameterλis sometimes called the
precisionof the t-distribution, even though it is not in general equal to the inverse
of the variance. The parameterνis called thedegrees of freedom, and its effect is
illustrated in Figure 2.15. For the particular case ofν=1, the t-distribution reduces
to theCauchydistribution, while in the limitν→∞the t-distributionSt(x|μ, λ, ν)
Exercise 2.47 becomes a GaussianN(x|μ, λ−^1 )with meanμand precisionλ.
From (2.158), we see that Student’s t-distribution is obtained by adding up an
infinite number of Gaussian distributions having the same mean but different preci-
sions. This can be interpreted as an infinite mixture of Gaussians (Gaussian mixtures
will be discussed in detail in Section 2.3.9. The result is a distribution that in gen-
eral has longer ‘tails’ than a Gaussian, as was seen in Figure 2.15. This gives the t-
distribution an important property calledrobustness, which means that it is much less
sensitive than the Gaussian to the presence of a few data points which areoutliers.
The robustness of the t-distribution is illustrated in Figure 2.16, which compares the
maximum likelihood solutions for a Gaussian and a t-distribution. Note that the max-
imum likelihood solution for the t-distribution can be found using the expectation-
Exercise 12.24 maximization (EM) algorithm. Here we see that the effect of a small number of

Pattern Recognition and Machine Learning

=

=

Get our desktop app

Company

Features

Documentation

Resources