2.3. The Gaussian Distribution 103
Figure 2.15 Plot of Student’s t-distribution (2.159)
forμ=0andλ=1for various values
ofν. The limitν→∞corresponds
to a Gaussian distribution with mean
μand precisionλ.
ν→∞
ν=1. 0
ν=0. 1
− 5 0 5
0
0.1
0.2
0.3
0.4
0.5
p(x|μ, a, b)=
∫∞
0
N(x|μ, τ−^1 )Gam(τ|a, b)dτ (2.158)
=
∫∞
0
bae(−bτ)τa−^1
Γ(a)
(τ
2 π
) 1 / 2
exp
{
−
τ
2
(x−μ)^2
}
dτ
=
ba
Γ(a)
(
1
2 π
) 1 / 2 [
b+
(x−μ)^2
2
]−a− 1 / 2
Γ(a+1/2)
where we have made the change of variablez=τ[b+(x−μ)^2 /2]. By convention
we define new parameters given byν =2aandλ=a/b, in terms of which the
distributionp(x|μ, a, b)takes the form
St(x|μ, λ, ν)=
Γ(ν/2+1/2)
Γ(ν/2)
(
λ
πν
) 1 / 2 [
1+
λ(x−μ)^2
ν
]−ν/ 2 − 1 / 2
(2.159)
which is known asStudent’s t-distribution. The parameterλis sometimes called the
precisionof the t-distribution, even though it is not in general equal to the inverse
of the variance. The parameterνis called thedegrees of freedom, and its effect is
illustrated in Figure 2.15. For the particular case ofν=1, the t-distribution reduces
to theCauchydistribution, while in the limitν→∞the t-distributionSt(x|μ, λ, ν)
Exercise 2.47 becomes a GaussianN(x|μ, λ−^1 )with meanμand precisionλ.
From (2.158), we see that Student’s t-distribution is obtained by adding up an
infinite number of Gaussian distributions having the same mean but different preci-
sions. This can be interpreted as an infinite mixture of Gaussians (Gaussian mixtures
will be discussed in detail in Section 2.3.9. The result is a distribution that in gen-
eral has longer ‘tails’ than a Gaussian, as was seen in Figure 2.15. This gives the t-
distribution an important property calledrobustness, which means that it is much less
sensitive than the Gaussian to the presence of a few data points which areoutliers.
The robustness of the t-distribution is illustrated in Figure 2.16, which compares the
maximum likelihood solutions for a Gaussian and a t-distribution. Note that the max-
imum likelihood solution for the t-distribution can be found using the expectation-
Exercise 12.24 maximization (EM) algorithm. Here we see that the effect of a small number of