A. Colin Cameron 763
a separable model. But this is not a structural model in the sense of the ASF given
earlier.
Controlling for unobserved heterogeneity is an active area in microecono-
metrics, as much of the variation in the outcome is due to unobserved factors
since, typically,R^2 <0.5. It is particularly important when there is sample selec-
tion or self-selection. For example, in OLS regression we essentially require only
that E[u|x]=0, whereas if the sample is truncated or censored, much stronger
assumptions onuare needed even if semiparametric methods are used. Heckman
(2000, 2005) and related papers explicitly consider heterogeneity and structural
estimation (see also Blundell and Powell, 2004; Wooldridge, 2005).
14.7 Data issues
Microeconometric data are often survey data that come from sampling schemes
more complicated than simple random sampling, and key variables can be mis-
measured or even missing due to nonresponse. These issues are generally ignored
in applied work. Ignoring the sampling scheme is reasonable in the many cases
where the sampling scheme or nonresponse mechanism leads to a sample that is
nonrepresentative only of the regressors, while maintaining representativeness of
the dependent variable conditional on regressors. It is also reasonable to ignore
measurement error if it is classical measurement error in the dependent variable
in a linear model. In other cases standard estimators are often inconsistent and
alternative estimators are needed.
14.7.1 Sampling schemes
Survey data often use stratified and clustered sampling to lower interview costs and
to provide more precise estimates for population sub-groups, such as regions with
relatively few people, than would otherwise be the case. The extensive sample
survey literature, initially focused on estimation of population means but then
extended to the regression case, has generally been ignored by the econometrics
literature.
The first issue raised by survey sampling schemes is that the sample is no longer
representative of the population. For inference on a single variable it is necessary
to adjust for this. For example, average earnings in a nonrepresentative sample
will be an inconsistent estimate of population mean earnings. For regression anal-
ysis, adjustment is necessary if the sample is nonrepresentative for the dependent
variable after conditioning on regressors (endogenous stratification), but may not
be necessary if the sample is nonrepresentative only for the regressors (exogenous
stratification).
For endogenous stratification, where stratification is on the dependent vari-
able in a regression, standard estimation methods lead to inconsistent parameter
estimates.
Consider stratification in a likelihood framework. Let the conditional distribu-
tion ofygivenxbe denotedf(y|x,θ). Usually the joint density ofyandxis
g(y,x|θ)=f(y|x,θ)×g(x), where the parameters ing(x)are suppressed. Under