Pattern Recognition and Machine Learning

(Jeff_L) #1
688 B. PROBABILITY DISTRIBUTIONS

Gamma


The Gamma is a probability distribution over a positive random variableτ> 0
governed by parametersaandbthat are subject to the constraintsa> 0 andb> 0
to ensure that the distribution can be normalized.

Gam(τ|a, b)=

1

Γ(a)

baτa−^1 e−bτ (B.26)

E[τ]=

a
b

(B.27)

var[τ]=

a
b^2

(B.28)

mode[τ]=

a− 1
b

forα 1 (B.29)

E[lnτ]=ψ(a)−lnb (B.30)
H[τ]=lnΓ(a)−(a−1)ψ(a)−lnb+a (B.31)

whereψ(·)is the digamma function defined by (B.25). The gamma distribution is
the conjugate prior for the precision (inverse variance) of a univariate Gaussian. For
a 1 the density is everywhere finite, and the special case ofa=1is known as the
exponentialdistribution.

Gaussian


The Gaussian is the most widely used distribution for continuous variables. It is also
known as thenormaldistribution. In the case of a single variablex∈(−∞,∞)it is
governed by two parameters, the meanμ∈(−∞,∞)and the varianceσ^2 > 0.

N(x|μ, σ^2 )=

1

(2πσ^2 )^1 /^2

exp

{

1

2 σ^2

(x−μ)^2

}
(B.32)

E[x]=μ (B.33)
var[x]=σ^2 (B.34)
mode[x]=μ (B.35)

H[x]=

1

2

lnσ^2 +

1

2

(1 + ln(2π)). (B.36)

The inverse of the varianceτ=1/σ^2 is called the precision, and the square root
of the varianceσis called the standard deviation. The conjugate prior forμis the
Gaussian, and the conjugate prior forτis the gamma distribution. If bothμandτ
are unknown, their joint conjugate prior is the Gaussian-gamma distribution.
For aD-dimensional vectorx, the Gaussian is governed by aD-dimensional
mean vectorμand aD×Dcovariance matrixΣthat must be symmetric and
Free download pdf