1306 Testing Econometric Software
least reliable method for such computation. The classic time series text by Priestley
(1981) presents four methods for computing partial autocorrelation coefficients,
and he presents them indecreasingorder of reliability: the Yule–Walker method
is presented fourth. The Yule–Walker equations are, from a computational stand-
point, easy to implement, and they might have been justified back in the days
when computing power was expensive; in the present day, they cannot be jus-
tified. Very few econometric packages offer better methods. One such method is
the Burg algorithm. However, there was no benchmark for the method, so McCul-
lough (1998b) computed one. Now some econometrics time series packages offer
the Burg method as an improvement over the Yule–Walker equations.
There are not many advanced benchmarks for econometric software, and there
is precious little tangible evidence that any econometric software package is giving
the correct answer for any moderately complicated problem.
28.6 Benchmarks for ARMA models
In an important article, Newbold, Agiakloglou and Miller (1994) (henceforth NAM)
observed: “fitting the same model to the same data will yield more or less identical
results whatever software is used for multiple regression. That is not the case for
the estimation of the parameters of an ARIMA model.” In part, this may be due to
the fact that NAM placed themselves in the position of a novice user, i.e., “though
many programs allow the user a range of optional modifications, we generally ran
them in default mode.” If one thing has been learned from the literature on sta-
tistical and econometric software accuracy, it is that default options for nonlinear
estimation procedures typically do not produce accurate answers. Such matters
as choice of algorithm, convergence criterion, convergence tolerance, and initial
conditions, can all greatly affect the quality of the answer produced by a nonlinear
estimation procedure. For example, in the case of autoregressive integrated mov-
ing average (ARIMA) procedures, some packages conduct preliminary estimations
to determine starting values, while others simply use zeros. This fact alone could
account for much variation between packages. Therefore, it may seem entirely
possible that the packages examined by NAM would have exhibited little varia-
tion in the range of results produced if only they had adopted the posture of an
experienced user. Such, however, turns out not to be the case, as will be shown.
Given that the differences are not due to the use of default options, the notion
that algorithmic differences may be responsible comes to mind. In the case of
unconditional least squares (ULS) with backcasting, there is no one preferred
method of backcasting, so perhaps this may account for the differences. NAM
(1994, p. 580) pointedly address this notion in the discussion of their conditional
least squares (CLS) results, for which no such difference is possible. Even in the
cases when point estimates agree, NAM note substantial variation in the estimates
of standard errors.
Thus, the only means to resolve the discrepancies between packages is the pro-
duction of a benchmark. The production of a benchmark typically requires the
use of extended precision computation, i.e., more than double precision. One