1304 Testing Econometric Software
Generally speaking, econometric discussions of simulation simply assume a high-
quality RNG the same way that econometrics texts discussing nonlinear estimation
assume a high-quality solver. For example, the text on simulation in econometrics
by Gouieroux and Montfort (1997) makes no mention of testing RNGs.
28.4.3 Statistical Distributions
Bai and Perron (2003) (henceforth BP) published an article in theJournal of Applied
Econometricsabout structural change models, and used an econometric software
package to write software for their method. Zeileis and Kleiber (2005) (henceforth
ZK) attempted to port the code to R. We first note that theJournal of Applied Econo-
metricshas a data-only mandatory archive, i.e., BP were under no obligation to
supply their code to would-be replicators, but they did so. While ZK were able
to reproduce much of what BP did, they were unable to match some confidence
intervals for break-points. As one example, both packages estimated a break-point
at the third quarter of 1972, 1972:3, but the BP interval was [1970:3, 1972:4] while
the ZK interval was [1969:1, 1972:4]. After much numerical detective work, ZK
finally ascertained that the BP software package had a function for the Normal
CDF (cumulative distribution function) that was inaccurate in the tails, while the R
package had a Normal CDF that was accurate in the tails. The accuracy of statistical
distributions matters.
On a related note, there is a definite need for the profession to find every article
using the Normal CDF function in the package used by BP, and check to see if the
results are wrong. Of course, since journals largely do not require authors even to
identify what package they use, let alone supply the code, there is no hope of ever
accomplishing this task. This example shows how replication can uncover errors
not only in published articles, but in software, too.
To test statistical distributions, one needs an accurate source for the desired
quantities – not just any software package will do. For years the primary sources
were Knüsel’s (1989) ELV package and Brown’s DCDFLIB (available in C and For-
tran). McCullough (2000c) showed that the program Mathematica was at least as
good as ELV and, lately, Yalta (2008) showed that Mathematica produces results
more accurate than ELV. One simply compares the output from one of these
three programs to the output produced by the econometric software in question.
Naturally, one cannot assess all possible outputs, so one examines a carefully
chosen subset. For the normal distribution, one might first test the following
percentiles:{0.0001, 0.001, 0.01, 0.1, 0.2,...0.9, 0.99, 0.999, 0.9999}and check the
extreme tails to find out where the algorithm in the econometric software breaks
down; it might be completely inaccurate for the 0.9999999999 percentile. A sim-
ilar approach might be undertaken for thet-distribution, except it has to be done
for different degrees of freedom. The process gets more complex for distributions
with two or more parameters, e.g., the F-distribution. Complete details for testing
can be found in McCullough (1998a, sec. 6). Most packages are able to compute
these distributions for simple hypothesis testing, but methods that require distribu-
tions to be evaluated in the extreme tails, e.g., value-at-risk testing or saddle-point
approximation, should only be undertaken with packages that have very accurate
distributions.