1.4. Measures of Spread http://www.ck12.org
cover in later chapters in favor of adjusting for sampling error. Dividing byn−1 is only necessary for the calculation
of the standard deviation of a sample. When you are calculating the standard deviation of a population, you divide
by the number of numbers(N). But when you have a sample, you are not getting data for the entire population and
there is bound to be random variation due to sampling (remember that this is calledsampling error).
When we claim to have the standard deviation, we are making the following statement:
“The typical distance of a point from the mean is... ”
But we might be off by a little from using a sample, so it would be better to overestimatesto represent the standard
deviation.
Sample Standard Deviation:
s=
√
∑ni= 1 (xi−x)^2
n− 1
Because the variance is the square of the standard deviation, the variance formulas are as follows:
Variance of a population:
σ^2 =
∑Ni= 1 (xi−x)^2
N
Variance of a sample:
s^2 =
∑ni= 1 (xi−x)^2
n− 1
Chebyshev’s Theorem
Pafnuty Chebyshev was a 19thCentury Russian mathematician. The theorem named for him gives us information
about how many elements of a data set are within a certain number of standard deviations of the mean.
The formal statement is as follows:
The proportion of data that lies within k standard deviations of the mean is at least:
1 −k^12 , wherek> 1
As an example, let’s return to the rainfall data from Mobile. The mean yearly rainfall amount is 69.3 and the sample
standard deviation is about 14.4.
Let’s investigate the information that Chebyshev’s Theorem gives us about the proportion of data within 2 standard
deviations of the mean. If we replacekwith 2, the result is:
1 −
1
22
= 1 −
1
4
=
3
4
So the theorem predicts that at least 75% of the data is within 2 standard deviations of the mean.