Sampling Distributions
Suppose we drew a sample of size 10 from an approximately normal population with unknown mean and
standard deviation and got = 18.87. Two questions arise: (1) what does this sample tell us about the
population from which the sample was drawn, and (2) what would happen if we drew more samples?
Suppose we drew 5 more samples of size 10 from this population and got
and . In answer to question (1), we might believe that
the population from which these samples was drawn had a mean around 20 because these averages tend to
group there (in fact, the six samples were drawn from a normal population whose mean is 20 and whose
standard deviation is 4). The mean of the 6 samples is 19.64, which supports our feeling that the mean of
the original population might have been 20.
The standard deviation of the 6 samples is 0.68, and you might not have any intuitive sense about how
that relates to the population standard deviation, although you might suspect that the standard deviation of
the samples should be less than the standard deviation of the population because the chance of an extreme
value for an average should be less than that for an individual term (it just doesn’t seem very likely that
we would draw a lot of extreme values in a single sample).
Suppose we continued to draw samples of size 10 from this population until we were exhausted or
until we had drawn all possible samples of size 10 . If we did succeed in drawing all possible samples of
size 10, and computed the mean of each sample, the distribution of these sample means would be the
sampling distribution of .
Remembering that a “statistic” is a value that describes a sample, the sampling distribution of a
statistic is the distribution of that statistic for all possible samples of a given size. It’s important to
understand that a dotplot of a few samples drawn from a population is not a distribution (it’s a simulation
of a distribution)—it becomes a distribution only when all possible samples of a given size are drawn.
Sampling Distribution of a Sample Mean
Suppose we have the sampling distribution of . That is, we have formed a distribution of the means of
all possible samples of size n from an unknown population (thus, we know little about its shape, center, or
spread). Let μ and σ represent the mean and standard deviation of the sampling distribution of ,
respectively.
Then
for any population with mean μ and standard deviation σ .
(Note: the value given for σ above is correct only if the sample there is true independence between
trials, such as when sampling with replacement, or when the population is infinite, such as when tossing a
coin. The formula still works well enough if the sample size n is small compared to the population size N
. A general rule is that n should be no more than 10% of N to use the value given for σ (that is, N > 10n ).
If n is more than 10% of N , the exact value for the standard deviation of the sampling distribution is
In practice this usually isn’t a major issue because