742 Microeconometrics: Methods and Developments
regression if instead the unknown true quantile function is nonlinear, and provide
the asymptotic distribution in that case. The situation is analogous to that for OLS
and ML under model misspecification.
Quantile regression has recently become a very active area of research with exten-
sions including instrumental variables estimation (see Chernozhukov and Hansen,
2005), and a richer range of models for censored data (see Honore and Hu, 2004).
Koenker (2005) provides many results on quantile regression.
14.3.6 Nonparametric and semiparametric methods
Consider the regression model:
E[yi|xi]=m(xi), (14.16)
where the functionm(x)is unspecified. Nonparametric regression provides a con-
sistent estimate ofm(x). At the specific pointx=x 0 ,m(x 0 )can be estimated by
taking a local weighted average ofyiover those observations withxiin a neigh-
borhood ofx 0. There are many variations on this approach, including kernel
regression, nearest neighbors regression, local linear, local polynomial, Lowess,
smoothing spline and series estimators. Less thanNobservations are effectively
used at any pointx 0 , because a local average is taken, sôm(x 0 )
p
→m(x 0 )at a rate
less than the usualN−^1 /^2 , although asymptotic normality still holds.
Fully nonparametric regression works best in practice when there is just a single
regressor. Even then, empirical results vary greatly with the choice of bandwidth
or window width that defines the size of the neighborhood. “Plug-in” estimates of
the bandwidth that work well for density estimation often work poorly for regres-
sion. The standard method is to use leave-one-out cross-validation to select the
bandwidth, but this method is by no means perfect.
There is no theoretical obstacle to using nonparametric regression when there are
many regressors. But in practice nonparametric methods usually work poorly with
more than very few regressors, due to a curse of dimensionality that arises because
the local averages will be made over fewer observations. For example, if averaging
is over 10 bins with one regressor then averaging may need to be over 10^2 = 100
bins when there are two regressors. More formally, the optimal convergence rate
using mean squared error as a criterion isN−^2 /(dim[x]+^4 ), so the convergence rate
decreases as dim[x]increases. This problem is less severe when some regressors
take only a few values, such as binary indicator variables. Racine and Li (2004)
present results for kernel regression when some regressors are discrete and some
are continuous.
The microeconometrics literature focuses on semiparametric methods that over-
come the curse of dimensionality by partially parameterizing a model, so that there
is a mix of parametric and nonparametric components. A very early example is the
maximum score estimator for the binary choice model of Manski (1975).
Theoretically, a first step is to determine whether parameters are identified given
only partial specification of the model. Ideally
√
N-consistent and asymptotically
normal estimates of the parameters can be obtained. Furthermore, it is preferred