7.11. Box-and-Whisker Plots http://www.ck12.org
Information about the data set that can be determined from the box-and-whisker plot with respect to the length of
the whiskers includes the following:
a. If the whiskers are the same or almost the same length, the distribution is approximately symmetric.
b. If the right whisker is longer than the left whisker, the distribution is positively skewed.
c. If the left whisker is longer than the right whisker, the distribution is negatively skewed.
The length of the whiskers also gives you information about how spread out the data is.
A box-and-whisker plot is often used when the number of data values is large. The center of the distribution, the
nature of the distribution, and the range of the data are very obvious from the graph. The five-number summary
divides the data into quarters by use of the medians of the upper and lower halves of the data. Many data sets contain
values that are either extremely high values or extremely low values compared to the rest of the data values. These
values are calledoutliers. There are several reasons why a data set may contain an outlier. Some of these are listed
below:
- The value may be the result of an error made in measurement or in observation. The researcher may have
measured the variable incorrectly. - The value may simply be an error made by the researcher in recording the value. The value may have been
written or typed incorrectly. - The value could be a result obtained from a subject not within the defined population. A researcher recording
marks from a math 12 examination may have recorded a mark by a student in grade 11 who was taking math
12. - The value could be one that is legitimate but is extreme compared to the other values in the data set. (This
rarely occurs, but it is a possibility.)
If an outlier is present because of an error in measurement, observation, or recording, then either the error should be
corrected, or the outlier should be omitted from the data set. If the outlier is a legitimate value, then the statistician
must make a decision as to whether or not to include it in the set of data values. There is no rule that tells you what
to do with an outlier in this case.
One method for checking a data set for the presence of an outlier is to follow the procedure below:
- Organize the given data set and determine the values ofQ 1 andQ 3.
- Calculate the difference betweenQ 1 andQ 3. This difference is called theinterquartile range (IQR):IQR=
Q 3 −Q 1. - Multiply the difference by 1.5, subtract this result fromQ 1 , and add it toQ 3.
- The results from Step 3 will be the range into which all values of the data set should fit. Any values that are
below or above this range are considered outliers.
Example A
For each box-and-whisker plot, list the five-number summary and describe the distribution based on the location of
the median.