Pattern Recognition and Machine Learning

688 B. PROBABILITY DISTRIBUTIONS

Gamma

The Gamma is a probability distribution over a positive random variableτ> 0 governed by parametersaandbthat are subject to the constraintsa> 0 andb> 0 to ensure that the distribution can be normalized.

Gam(τ|a, b)=

1

Γ(a)

baτa−^1 e−bτ (B.26)

E[τ]=

a b

(B.27)

var[τ]=

a b^2

(B.28)

mode[τ]=

a− 1 b

forα 1 (B.29)

E[lnτ]=ψ(a)−lnb (B.30) H[τ]=lnΓ(a)−(a−1)ψ(a)−lnb+a (B.31)

whereψ(·)is the digamma function defined by (B.25). The gamma distribution is the conjugate prior for the precision (inverse variance) of a univariate Gaussian. For a 1 the density is everywhere finite, and the special case ofa=1is known as the exponentialdistribution.

Gaussian

The Gaussian is the most widely used distribution for continuous variables. It is also known as thenormaldistribution. In the case of a single variablex∈(−∞,∞)it is governed by two parameters, the meanμ∈(−∞,∞)and the varianceσ^2 > 0.

N(x|μ, σ^2 )=

1

(2πσ^2 )^1 /^2

exp

{ −

1

2 σ^2

(x−μ)^2

} (B.32)

E[x]=μ (B.33) var[x]=σ^2 (B.34) mode[x]=μ (B.35)

H[x]=

1

2

lnσ^2 +

1

2

(1 + ln(2π)). (B.36)

The inverse of the varianceτ=1/σ^2 is called the precision, and the square root of the varianceσis called the standard deviation. The conjugate prior forμis the Gaussian, and the conjugate prior forτis the gamma distribution. If bothμandτ are unknown, their joint conjugate prior is the Gaussian-gamma distribution. For aD-dimensional vectorx, the Gaussian is governed by aD-dimensional mean vectorμand aD×Dcovariance matrixΣthat must be symmetric and

Pattern Recognition and Machine Learning

688 B. PROBABILITY DISTRIBUTIONS

Gamma

1

(B.27)

(B.28)

Gaussian

1

1

1

2

1

2

Get our desktop app

Company

Features

Documentation

Resources