12.5The Runs Test for Randomness 533
The simulation approach, on the other hand, can require a great deal of computational
time. However, if an immediate answer is not required and great accuracy is desired, then
simulation, by running a large number of cases, can be made accurate to an arbitrarily
prescribed precision.
12.5 The Runs Test for Randomness
A basic assumption in much of statistics is that a set of data constitutes a random sample
from some population. However, it is sometimes the case that the data are not generated
by a truly random process but by one that may follow a trend or a type of cyclical pattern.
In this section, we will consider a test — called the runs test — of the hypothesisH 0 that
a given data set constitutes a random sample.
To begin, let us suppose that each of the data values is eithera0ora1.That is, we shall
assume that each data value can be dichotomized as being either a success or a failure. Let
X 1 ,...,XNdenote the set of data. Any consecutive sequence of either 1’s or 0’s is called
arun. For instance, the data set
1001110010111101000011
contains 11 runs — 6 runs of 1 and 5 runs of 0. Suppose that the data setX 1 ,...,XN
containsn1’s andm0’s, wheren+m=N, and letRdenote the number of runs.
Now, ifH 0 were true, thenX 1 ,...,XNwould be equally likely to be any of theN!/(n!m!)
permutations ofn1’s andm0’s; and therefore, given a total ofn1’s andm0’s, it follows
that, underH 0 , the probability mass function ofR, the number of runs is given by
PH 0 {R=k}=
number of permutations ofn1’s andm0’s resulting inkruns
(
n+m
n
)
This number of permutations can be explicitly determined and it can be shown that
PH 0 {R= 2 k}= 2
(
m− 1
k− 1
)(
n− 1
k− 1
)
(
m+n
n
)
(12.5.1)
PH 0 {R= 2 k+ 1 }=
(
m− 1
k− 1
)(
n− 1
k
)
+
(
m− 1
k
)(
n− 1
k− 1
)
(
n+m
n
)
If the data containn1’s andm0’s, then the runs test calls for rejection of the hypothesis
that the data constitutes a random sample if the observed number of runs is either too