David T. Jacho-Chávez and Pravin K. Trivedi 783
whereτis small andej =[0,...,0,1,0,...,0]$is aK-vector with unity in the
jth row and zeros elsewhere. In theory,τshould be chosen such that∂Qn(θ)/∂θj
=limτ→ 0 Qn(θ)/θj. Although, in principle, analytical derivatives are preferred,
numerical approximations produce virtually identical results in many regular cases.
Whether the application of a gradient based optimizer turns out to be com-
putationally demanding depends upon many factors, including the following:
(i) dimension ofθ; (ii) whether the objective function is well approximated by
a quadratic expansion in the neighborhood of the optimum; (iii) whetherθis
robustly identified in the sample; and (iv) whether the iterative algorithm starts
sufficiently close to the optimum. Conversely, convergence can be slow when the
dimension ofθis high, some components ofθare weakly identified in the data,
and/or the objective function admits multiple maxima or a flat region around the
maximum, and the starting values for the update equation are poor. If an objective
function is difficult to maximize, e.g., due to multiple local optima, non-gradient
methods may be used, such as simulated annealing and genetic algorithms. A lead-
ing example is kernel-based cross-validation (see, e.g., algorithms 15.4.1.1.1 and
15.4.1.2.1 below), where Powell’s direction set search algorithm is also a viable
alternative (see Presset al., 1992, section 10.5, pp. 412–20).
Another example of a case where a gradient-based method is inappropriate is the
least absolute deviation (LAD) regression or the quantile regression. In the case of
LAD regression, the objective functionQn(θ)=n−^1
∑
i|yi−x
$
iβ|has no derivative.
In the case of quantile regression, the objective function is minimized overβq; i.e.:
Qn(βq)=
∑n
i:yi≥x$iβ
q|yi−x$iβq|+
∑n
i:yi<x$iβ
( 1 −q)|yi−x$iβq| (15.3)
=
∑
ρq(uq), (15.4)
where 0<q<1,ρq(λ)=(q−I(q< 0 ))λdenotes the check function,I(·)represents
the indicator function that equals 1 if its argument is true and 0 otherwise, and
the notationβqemphasizes that, for different choices ofq, different values ofβare
obtained.
The estimator defined by the minimand minβqQn(βq)is an M-estimator and, as
such, its asymptotic properties are well established (see Amemiya, 1985). The opti-
mization problem has an interpretation in the GMM framework as well as in a linear
programming (LP) framework (see Buchinsky, 1995). To see the LP representation,
the QR is written thus:
yi=x$iβq+uiq
=x$i(β[q^1 ]−β[q^2 ])+(ε[iq^1 ]−ε[iq^2 ]),
whereβq[^1 ,j]≥0,βq[^2 ,j]≥0,j=1,...,K, andε[iq^1 ]≥0,ε[iq^2 ]≥0,i=1,...,n. The
optimization problem can be expressed as that of minimizing a linear objective