Pattern Recognition and Machine Learning

116 2. PROBABILITY DISTRIBUTIONS

which, after some simple rearrangement, can be cast in the standard exponential
Exercise 2.57 family form (2.194) with

η =

( μ/σ^2 − 1 / 2 σ^2

) (2.220)

u(x)=

( x x^2

) (2.221)

h(x)=(2π)−^1 /^2 (2.222)

g(η)=(− 2 η 2 )^1 /^2 exp

( η 12 4 η 2

)

. (2.223)

2.4.1 Maximum likelihood and sufficient statistics

Let us now consider the problem of estimating the parameter vectorηin the gen- eral exponential family distribution (2.194) using the technique of maximum likelihood. Taking the gradient of both sides of (2.195) with respect toη,wehave

∇g(η)

∫ h(x)exp

{ ηTu(x)

} dx

+ g(η)

∫ h(x)exp

{ ηTu(x)

} u(x)dx=0. (2.224)

Rearranging, and making use again of (2.195) then gives

−

1

g(η)

∇g(η)=g(η)

∫ h(x)exp

{ ηTu(x)

} u(x)dx=E[u(x)] (2.225)

where we have used (2.194). We therefore obtain the result

−∇lng(η)=E[u(x)]. (2.226)

Note that the covariance ofu(x)can be expressed in terms of the second derivatives
Exercise 2.58 ofg(η), and similarly for higher order moments. Thus, provided we can normalize a
distribution from the exponential family, we can always find its moments by simple
differentiation.
Now consider a set of independent identically distributed data denoted byX=
{x 1 ,...,xn}, for which the likelihood function is given by

p(X|η)=

(N ∏

n=1

h(xn)

) g(η)Nexp

{ ηT

∑N

n=1

u(xn)

}

. (2.227)

Setting the gradient oflnp(X|η)with respect toηto zero, we get the following condition to be satisfied by the maximum likelihood estimatorηML

−∇lng(ηML)=

1

N

∑N

n=1

u(xn) (2.228)

Pattern Recognition and Machine Learning

116 2. PROBABILITY DISTRIBUTIONS

2.4.1 Maximum likelihood and sufficient statistics

−

1

1

N

Get our desktop app

Company

Features

Documentation

Resources