Pattern Recognition and Machine Learning

(Jeff_L) #1
116 2. PROBABILITY DISTRIBUTIONS

which, after some simple rearrangement, can be cast in the standard exponential
Exercise 2.57 family form (2.194) with


η =

(
μ/σ^2
− 1 / 2 σ^2

)
(2.220)

u(x)=

(
x
x^2

)
(2.221)

h(x)=(2π)−^1 /^2 (2.222)

g(η)=(− 2 η 2 )^1 /^2 exp

(
η 12
4 η 2

)

. (2.223)


2.4.1 Maximum likelihood and sufficient statistics


Let us now consider the problem of estimating the parameter vectorηin the gen-
eral exponential family distribution (2.194) using the technique of maximum likeli-
hood. Taking the gradient of both sides of (2.195) with respect toη,wehave

∇g(η)


h(x)exp

{
ηTu(x)

}
dx

+ g(η)


h(x)exp

{
ηTu(x)

}
u(x)dx=0. (2.224)

Rearranging, and making use again of (2.195) then gives


1

g(η)

∇g(η)=g(η)


h(x)exp

{
ηTu(x)

}
u(x)dx=E[u(x)] (2.225)

where we have used (2.194). We therefore obtain the result

−∇lng(η)=E[u(x)]. (2.226)

Note that the covariance ofu(x)can be expressed in terms of the second derivatives
Exercise 2.58 ofg(η), and similarly for higher order moments. Thus, provided we can normalize a
distribution from the exponential family, we can always find its moments by simple
differentiation.
Now consider a set of independent identically distributed data denoted byX=
{x 1 ,...,xn}, for which the likelihood function is given by


p(X|η)=

(N

n=1

h(xn)

)
g(η)Nexp

{
ηT

∑N

n=1

u(xn)

}

. (2.227)


Setting the gradient oflnp(X|η)with respect toηto zero, we get the following
condition to be satisfied by the maximum likelihood estimatorηML

−∇lng(ηML)=

1

N

∑N

n=1

u(xn) (2.228)
Free download pdf