B.D. McCullough 1299
statistical computing, one of the primary tenets of which is this: do not program
the formula from statistics texts. The fact that allegedly competent programmers
would implement the calculator formula underscores the need for users to test their
software.
Calculating the correlation coefficient is conceptually simple, especially since it
is bounded between+1 and−1. Yet some packages returned a correlation between
BIG and HUGE, or LITTLE and ROUND, or X and BIG, that was bigger than unity!
The developers of these packages never revealed why their packages were unable to
compute correctly a correlation coefficient. Also of interest, the correlation between
ZERO and any other variable should be undefined by definition because ZERO is a
constant so its standard deviationσz=0 and:
ρzx=
cov(z,x)
σzσx
. (28.3)
The 0 in the denominator means that the value of the ratio is undefined, yet some
packages computedρzx=0, and one package even managed to computeρzz=1.
Plotting BIG against LITTLE obviously should produce a straight line. Some pack-
ages were unable to do this. In one case, the software produced a single point in
the middle of the graph, dropping all the other points. Again, the developers did
not reveal the reasons for the failure.
Performing operations that involved the MISS variable revealed that not all
packages correctly handled missing values.
Regressing X on a constant, BIG and LITTLE should produce an error, since BIG
and LITTLE are linear transforms of each other, i.e., the matrix of independent
variables is singular. If a package does not recognize the singularity, then it can
grind through all the calculations and produce an answer – an incorrect answer,
but an answer nonetheless. Not all packages passed this test.
Wilkinson tests have been applied to many statistic and econometric software
packages, almost invariably revealing flaws of one sort or another. The only mystery
is why software developers don’t apply these tests themselves and fix the errors
before someone writes an article or software review about it. These tests are quick
and easy, taking less than an hour, and every user should make sure his package
passes all these tests – or if it doesn’t, have the developer fix the problem. If he or
she doesn’t fix it fast enough, a well-placed software review will convince him or
her to do so.
28.4 Intermediate tests
McCullough (1998a) proposed an intermediate set of tests covering three areas:
coefficient estimation based on the then recently released National Institute of
Standards and Technology’s (NIST) Statistical Reference Datasets (StRD), random
number generation, and statistical distributions (e.g., the functions used to deter-
mine critical values for various distributions). This methodology has been applied
by McCullough (1999a, 1999b), Vinod (2000), Sall (2002), Altman and McDonald