116 2. PROBABILITY DISTRIBUTIONS
which, after some simple rearrangement, can be cast in the standard exponential
Exercise 2.57 family form (2.194) with
η =(
μ/σ^2
− 1 / 2 σ^2)
(2.220)u(x)=(
x
x^2)
(2.221)h(x)=(2π)−^1 /^2 (2.222)g(η)=(− 2 η 2 )^1 /^2 exp(
η 12
4 η 2). (2.223)
2.4.1 Maximum likelihood and sufficient statistics
Let us now consider the problem of estimating the parameter vectorηin the gen-
eral exponential family distribution (2.194) using the technique of maximum likeli-
hood. Taking the gradient of both sides of (2.195) with respect toη,wehave∇g(η)∫
h(x)exp{
ηTu(x)}
dx+ g(η)∫
h(x)exp{
ηTu(x)}
u(x)dx=0. (2.224)Rearranging, and making use again of (2.195) then gives−
1
g(η)∇g(η)=g(η)∫
h(x)exp{
ηTu(x)}
u(x)dx=E[u(x)] (2.225)where we have used (2.194). We therefore obtain the result−∇lng(η)=E[u(x)]. (2.226)Note that the covariance ofu(x)can be expressed in terms of the second derivatives
Exercise 2.58 ofg(η), and similarly for higher order moments. Thus, provided we can normalize a
distribution from the exponential family, we can always find its moments by simple
differentiation.
Now consider a set of independent identically distributed data denoted byX=
{x 1 ,...,xn}, for which the likelihood function is given by
p(X|η)=(N
∏n=1h(xn))
g(η)Nexp{
ηT∑Nn=1u(xn)}. (2.227)
Setting the gradient oflnp(X|η)with respect toηto zero, we get the following
condition to be satisfied by the maximum likelihood estimatorηML−∇lng(ηML)=1
N
∑Nn=1u(xn) (2.228)