The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
16 CHAPTER 2 Sampling from Populations

random sample, and the average of this group could be expected to be
higher than the class average. The amount that it is higher is the bias
of the prediction. Bias is something we want to avoid because usually
we cannot adjust our estimate to get a good prediction.
In addition to bias (which can be avoided by randomization), an
estimate or prediction will have a variance. The variance is a measure
of the variability in estimates that would be obtained by repeating the
sampling process. While bias cannot be controlled by the sample size,
the variance can. The larger the sample size is, the smaller is the vari-
ance of the estimate, or in the example, above the prediction of the
class average.
Suppose that instead of taking a random sample of size 5, we took
a random sample of size 10. Then, for an estimate known to be unbi-
ased (e.g., the sample mean) will still be unbiased, and its variance will
be lower, meaning that it will tend to be closer to the value for the
entire class. You can imagine that if we chose 39 out of the 40 at
random, the prediction would be extremely close to the class average,
and if we had taken all 40, it will equal the class average and have zero
variance.
An excellent example that illustrates the need for random sampling
and the bias in prediction when the sample is not random is the Literary
Digest ’ s prediction of the winner of the 1936 U.S. Presidential elec-
tion. Franklin Roosevelt was the incumbent and the Democratic
nominee. Alfred Landon was the Republican nominee. To predict the
winner, the Literary Digest mailed out 10 million ballots asking regis-
tered voters which candidate they preferred. A total of 2.3 million out
of the 10 million ballots were returned and on the basis of the results
for the 2.3 million the Literary Digest predicted Landon to be a big
winner.
Although the number of voters in the election would be a lot more
than the actual or even the intended sample, that sample size is large
enough that if it were a random sample of those who would vote, it
would have a very small standard deviation (in political surveys,
approximately 2 standard deviations for the estimate is called the
margin of error), and the prediction would be highly reliable. The result
of the election, however, was that Roosevelt won by a landslide, obtain-
ing 62% of the popular vote. This high visibility poll totally destroyed
the credibility of the Literary Digest , and soon caused it to cease pub-
lication. How could they have gone so wrong?

Free download pdf