A. Colin Cameron 745
Given data independent overi, the robust variance matrix estimate uses:
̂A=N−^1
∑
i
∂h(wi,θ)
∂θ′
∣∣
∣∣
̂θ
,
̂B=N−^1
∑
ih(wi,
̂θ)h(wi,̂θ)′. (14.22)
The resulting standard errors are called robust standard errors. In some cases the
Hessian̂Ain (14.22) may be replaced by the expected Hessian, and̂Bmay use a
degrees-of-freedom correction such as(N−q)−^1 rather thanN−^1.
A leading example is the heteroskedastic-consistent estimate of the variance-
covariance matrix of the OLS estimator. Thenqi(β)=−^12 (yi−x′iβ)^2 , where the
multiple^12 is added for convenience, so thathi(β)=∂qi/∂β=(yi−x′iβ)xi, and
∂hi(β)/∂β′=−xix′i. It follows that:
̂V[̂βOLS]=
[∑
ixix
′
i
]− 1 [∑
i
̂ui^2 xix′i
][∑
ixix
′
i
]− 1
, (14.23)
wherêui=(yi−x′îβ).
For ML estimation use of (14.22) relaxes the traditional information matrix equal-
ity assumption thatA 0 =−B 0 , which gives the simplificationA− 01 B 0 A 0 −^1 =−A− 01.
Failure of the information matrix equality generally implies inconsistency of the
MLE. Thenθ 0 needs to be reinterpreted as a “pseudo-true value,” which is the value
ofθthat maximizes the probability limit of 1/Ntimes the log-likelihood function.
However, the MLE does retain consistency in many standard models with specified
density in the linear exponential family, notably linear, Poisson, logit and probit,
provided the conditional mean function is correctly specified. Robust standard
errors are then especially applicable.
For independent errors the key early reference is White (1980), who proposed
the special case (14.23). Robust standard errors have been applied to many esti-
mators, including instrumental variables and generalized method of moments (see
(14.3)). T. Amemiya (1985) and Newey and McFadden (1994) provide quite general
treatments of inference and estimation (see also White, 1984).
The estimates in (14.22) can be extended to clustered data. In that case obser-
vations are grouped into clusters, with correlation permitted within a cluster but
independence assumed across clusters. An example is panel data where the cluster
unit is the individual: observations for a given individual over time are corre-
lated, but observations across individuals are independent. Letc=1,...,Cdenote
clusters and letj=1,...,Ncdenote theNcobservations in clusterc. Then the
cluster-robust variance matrix estimate is (14.21), wherêAis again given in (14.22)
but now:
̂B=N−^1
∑C
c= 1
∑Nc
j= 1
∑Nc
k= 1 h(wjc,
̂θ)h(wkc,̂θ)′. (14.24)
This estimator, proposed by Liang and Zeger (1986), permits both error
heteroskedasticity and quite flexible error correlation within cluster. It has largely
supplanted the use of a more restrictive random effects or error components model,