A. Colin Cameron 765
positively correlated. The standard procedure in econometrics is to use estima-
tors that ignore the clustering and base inference on cluster-robust standard errors,
presented in section 14.4.1. More efficient estimation is possible using feasible GLS
estimators that model the error correlation. In particular, hierarchical linear models
or multilevel models are often used in other social science disciplines but are sel-
dom used in econometrics. If clustering is felt to induce correlation of errors with
regressors, then cluster-specific fixed effects, analogous to an individual-specific
fixed effect in a panel data model, may also be used.
14.7.2 Missing data
The starting point for analysis of missing data is the terminology and assumptions
made about the nature of the process leading to missing data onwi, say, due to
Rubin (1976). These have many similarities with the potential outcomes model,
where the unknown counterfactual can also be viewed as a missing data problem.
If the probability ofwibeing missing depends on neither its own value or on other
data in the data set thenwiis missing completely at random (MCAR), and missing
data onwicauses no problems aside from efficiency loss. If the probability ofwi
being missing depends on other data in the data set, but not its own value, thenwi
is missing at random (MAR), and missing data may lead to estimator inconsistency.
Ifwiis MAR, then it is possible to adjust for missingness if the missing data mech-
anism is ignorable, meaning that the parameters of the missing data mechanism
are unrelated to the parameters that we estimate, similar to weak exogeneity.
Simple corrections for missing data include dropping an observation if any vari-
able is missing (listwise deletion or case deletion) and simple imputation methods
such as using the sample average or predictions from a fitted regression model.
These corrections are valid if data are MCAR or the missing data are regressors only
that are MAR with probability that is independent of the dependent variable.
The modern approach is to use multiple imputation methods that regard miss-
ing data as random variables and replaces with draws from an assumed underlying
distribution. LetW=(Wobs,Wmiss)denote the data partitioned into observed
and missing observations, and supposeWhas density f(W|θ). The multiple
imputation method imputesWmissunder the assumption of MAR with ignorable
missingness. There are several ways to make imputations. A preferred, though
computationally expensive, method is to use data augmentation and MCMC meth-
ods. Given ansth round estimate ofθ(r), we imputeW(missr+^1 )by making a draw
fromf(Wmiss|Wobs,θ(r)). Then a new estimateθ(r+^1 )is obtained by drawing from
f(θ|Wobs,Wmiss(r+^1 )). The chain is continued to convergence, giving an imputed value
forWmiss. Suppose we obtain imputed valueW(missI) and then obtain the MLE based
onf(Wobs,W(missI) |θ). This will overstate estimator precision as it fails to account for
the uncertainty created by imputation ofW(missI). Multiple imputation overcomes
this by obtainingmdifferent imputed values forWmissand hencemestimateŝθr,
r=1,...,m, with associated variance matricesV̂r=̂V[̂θr]. For further details see
Little and Rubin (1987), Rubin (1987) and Schafer (1997).