Palgrave Handbook of Econometrics: Applied Econometrics

1300 Testing Econometric Software

(2002), Kitchen, Drachenberg and Symanzik (2003), Teyssiere (2005), Keeling and
Pavur (2004, 2007), Yalta and Yalta (2007), and Yalta (2008), among others.^1

28.4.1 StRD

The StRD presents estimation problems in four suites: univariate summary statis-
tics, one-way analysis of variance, linear regression, and nonlinear least squares.
Each suite contains test problems at three levels of difficulty: lower, average, and
higher. The primary discussions for understanding and applying the StRD are found
in McCullough (1998a, 2000b). For the linear problems, NIST computed accurate
solutions by carrying 500 digits through all calculations, effectively eliminating
rounding error, and then rounding the final answer to 15 digits. Nonlinear prob-
lems were solved using different algorithms in quadruple precision, and rounding
the final solution to 11 digits. For each nonlinear problem there are two sets of
starting values: Start I and Start II. The former is far from the solution, which
makes it harder for a solver to find the solution. The latter is closer to the solution,
making it easier for a solver to find the solution.
One of the nine univariate problems (six lower, two average, one higher
difficulty) require the calculation of three summary statistics: mean, standard
deviation, and first-order autocorrelation coefficient. One of the lower-difficulty
problems, NumAcc1, consists of just three observations: 10000001, 10000003,

A program that employs the “calculator formula” will fail this test
starkly, even in double precision. It is a bit easier to diagnose a failure with these
data than with the similar data in Wilkinson tests. Many packages fail to get good
accuracy when calculating the first-order autocorrelation coefficient because they
use bad algorithms.
The ANOVA suite has four lower-, four average-, and three higher-difficulty prob-
lems. Even a good ANOVA algorithm that does not recenter the data will return a
completely inaccurate answer for the most difficult problem. If a package returns
four digits of accuracy on this problem, then it is safe to conclude that the pack-
age recenters the data (to reduce the effect of squaring) before calculating all the
relevant sums of squares. It is not uncommon to see packages, especially those
that have been around for a long time, fail even on the average difficulty tests.
The reason is that the packages use legacy code left over from the days of single-
precision computers. After the StRD is applied to such packages, the developers
usually update the code.
The linear regression suite has 11 test problems, two of lower difficulty, two of
average, and seven of higher difficulty. One of the higher-difficulty test problems is
the Longley benchmark, which most packages can handle easily these days. A prob-
lem that many packages cannot handle is the Filip dataset, which is a tenth-order
polynomial that is nearly singular. A good package will either produce accurate
coefficients or detect the singularity and refuse to produce a solution. This latter
result isnota failure, because the user has not been misled. A package that fails
this test, e.g., Excel 2000 and earlier (McCullough and Wilson, 2005), is capable
of producing completely inaccurate coefficients when confronted with collinear
data. A package that directly solves the normal equations, e.g., that calculates

Palgrave Handbook of Econometrics: Applied Econometrics

Get our desktop app

Company

Features

Documentation

Resources