364 Maximum Likelihood Methods
Remark 6.2.1.Note that the information is the weighted mean of either
[
∂logf(x;θ)
∂θ
] 2
or −
∂^2 logf(x;θ)
∂θ^2
,
where the weights are given by the pdff(x;θ). That is, the greater these derivatives
are on the average, the more information that we get aboutθ. Clearly, if they were
equal to zero [so thatθwould not be in logf(x;θ)], there would be zero information
aboutθ. The important function
∂logf(x;θ)
∂θ
is called thescore function. Recall that it determines the estimating equations
for the mle; that is, the mleθˆsolves
∑n
i=1
∂logf(xi;θ)
∂θ
=0
forθ.
Example 6.2.1(Information for a Bernoulli Random Variable).LetXbe Bernoulli
b(1,θ). Thus
logf(x;θ)=xlogθ+(1−x)log(1−θ)
∂logf(x;θ)
∂θ
=
x
θ
−
1 −x
1 −θ
∂^2 logf(x;θ)
∂θ^2
= −
x
θ^2
−
1 −x
(1−θ)^2
.
Clearly,
I(θ)=−E
[
−X
θ^2
−
1 −X
(1−θ)^2
]
=
θ
θ^2
+
1 −θ
(1−θ)^2
=
1
θ
+
1
(1−θ)
=
1
θ(1−θ)
,
which is larger forθvalues close to zero or one.
Example 6.2.2(Information for a Location Family).Consider a random sample
X 1 ,...,Xnsuch that
Xi=θ+ei,i=1,...,n, (6.2.7)
wheree 1 ,e 2 ,...,enare iid with common pdff(x) and with support (−∞,∞). Then
the common pdf ofXiisfX(x;θ)=f(x−θ). We call model (6.2.7) alocation
model. Assume thatf(x) satisfies the regularity conditions. Then the information
is
I(θ)=
∫∞
−∞
(
f′(x−θ)
f(x−θ)
) 2
f(x−θ)dx
=
∫∞
−∞
(
f′(z)
f(z)
) 2
f(z)dz, (6.2.8)