116 2. PROBABILITY DISTRIBUTIONS
which, after some simple rearrangement, can be cast in the standard exponential
Exercise 2.57 family form (2.194) with
η =
(
μ/σ^2
− 1 / 2 σ^2
)
(2.220)
u(x)=
(
x
x^2
)
(2.221)
h(x)=(2π)−^1 /^2 (2.222)
g(η)=(− 2 η 2 )^1 /^2 exp
(
η 12
4 η 2
)
. (2.223)
2.4.1 Maximum likelihood and sufficient statistics
Let us now consider the problem of estimating the parameter vectorηin the gen-
eral exponential family distribution (2.194) using the technique of maximum likeli-
hood. Taking the gradient of both sides of (2.195) with respect toη,wehave
∇g(η)
∫
h(x)exp
{
ηTu(x)
}
dx
+ g(η)
∫
h(x)exp
{
ηTu(x)
}
u(x)dx=0. (2.224)
Rearranging, and making use again of (2.195) then gives
−
1
g(η)
∇g(η)=g(η)
∫
h(x)exp
{
ηTu(x)
}
u(x)dx=E[u(x)] (2.225)
where we have used (2.194). We therefore obtain the result
−∇lng(η)=E[u(x)]. (2.226)
Note that the covariance ofu(x)can be expressed in terms of the second derivatives
Exercise 2.58 ofg(η), and similarly for higher order moments. Thus, provided we can normalize a
distribution from the exponential family, we can always find its moments by simple
differentiation.
Now consider a set of independent identically distributed data denoted byX=
{x 1 ,...,xn}, for which the likelihood function is given by
p(X|η)=
(N
∏
n=1
h(xn)
)
g(η)Nexp
{
ηT
∑N
n=1
u(xn)
}
. (2.227)
Setting the gradient oflnp(X|η)with respect toηto zero, we get the following
condition to be satisfied by the maximum likelihood estimatorηML
−∇lng(ηML)=
1
N
∑N
n=1
u(xn) (2.228)