AP Statistics 2017

(Marvins-Underground-K-12) #1

We have a pretty good intuitive sense of what an outlier is: it’s a value far removed from the others.
There is no rigorous mathematical formula for determining whether or not something is an outlier, but
there are a few conventions that people seem to agree on. Not surprisingly, some of them are based on the
mean and some are based on the median!
A commonly agreed-upon way to think of outliers based on the mean is to consider how many
standard deviations away from the mean a term is. Some texts identify a potential outlier as a datapoint
that is more than two or three standard deviations from the mean.
In a mound-shaped, symmetric, distribution, this is a value that has only about a 5% chance (for two
standard deviations) or a 0.3% chance (for three standard deviations) of being as far removed from the
center of the distribution as it is. Think of it as a value that is way out in one of the tails of the distribution.
Most texts now use a median-based measure and identify potential outliers in terms of how far a
datapoint is above or below the quartiles in a distribution. To find if a distribution has any outliers, do the
following (this is known as the “1.5 (IQR) rule”):


• Find the IQR.
• Multiply the IQR by 1.5.
• Find Q1 – 1.5(IQR) and Q3 + 1.5(IQR).
• Any value below Q1 – 1.5(IQR) or above Q3 + 1.5(IQR) is a potential outlier .


Some texts call an outlier defined as above a mild outlier. An extreme outlier would then be one that
lies more than 3 IQRs beyond Q1 or Q3.


example: The    following   data    represent   the amount  of  money,  in  British pounds, spent   weekly  on
tobacco for 11 regions in Britain: 4.03, 3.76, 3.77, 3.34, 3.47, 2.92, 3.20, 2.71, 3.53, 4.51,
4.56. Do any of the regions seem to be spending a lot more or less than the other regions? That
is, are there any outliers in the data?
solution: Using a calculator, we find , Sx = s = .59, Q1 = 3.2, Q3 = 4.03.

• Using means: 3.62 ± 2(0.59) = (2.44, 4.8). There are no values in the dataset less than 2.44 or greater
than 4.8, so there are no outliers by this method. We don’t need to check ± 3s since there were no
outliers using ± 2s .
• Using the 1.5(IQR) rule: Q1 – 1.5(IQR) = 3.2 – 1.5(4.03 – 3.2) = 1.96, Q3 + 1.5(IQR) = 4.03 +
1.5(4.03 – 3.2) = 5.28. Because there are no values in the data less than 1.96 or greater than 5.28, there
are no outliers by this method either.
Outliers are important because they will often tell us that something unusual or unexpected is going on
with the data that we need to know about. A manufacturing process that produces products so far out of
spec that they are outliers often indicates that something is wrong with the process. Sometimes outliers
are just a natural, but rare, variation. Often, however, an outlier can indicate that the process generating
the data is out of control in some fashion.


Position of a Term in a Distribution


Up until now, we have concentrated on the nature of a distribution as a whole. We have been concerned
with the shape, center, and spread of the entire distribution. Now we look briefly at individual cases in
the distribution.


Five-Number Summary

Free download pdf