Data Analysis with Microsoft Excel: Updated for Office 2007

(Tuis.) #1

164 Fundamentals of Statistics


STATPLUS TIPS

You can select all summary, variability, or distribution statistics
by clicking the appropriate checkboxes in the General dialog
sheet of the Univariate Statistics dialog box.
The Univariate Statistics command can display the table with
statistics displayed in rows or in columns.
You can add your own custom title to the output from the
Univariate Statistics command by typing a title in the Table
Title box in the General dialog sheet.

Outliers


As the earlier discussion on means and medians showed, distribution sta-
tistics can be heavily affected by extreme values. It’s diffi cult to analyze a
data set in which a single observation dominates all of the others, skewing
the results. These values, known as outliers, don’t seem to belong with the
others because they’re too small, too large, or don’t match the properties one
would expect for them. As you’ve seen, a large salary can affect an analysis
of salary values, pushing the average salary value upward. An outlier need
not be an extreme value. If you were to analyze fi tness data, the records of
an extremely fi t 75-year-old might not be remarkable compared to all of the
values in the distribution, but it might be unusual compared to the values of
others in his or her age group.
Outliers are caused by either mistakes in data entry or an unusual or
unique situation. A mistake in data entry is easier to deal with: You discover
and correct the mistake and then redo the analysis. If there is no mistake,
you have a bigger problem. In that case you have to study the outlier and
decide whether it really belongs with the other data values. For example,
in a study of Big Ten universities, we might decide to remove the results
from Northwestern because that school, unlike the other schools, is a small,
private institution. In the Albuquerque data, we might remove a high-
priced home from the sample if that house were a public landmark and thus
uniquely expensive.
However, and this point cannot be emphasized too strongly, merely being
an extreme value is not suffi cient grounds to remove an observation. Many
advances have been made by scientists studying the observations that didn’t
seem to fi t the expected distribution. Extreme values may be a natural part
of the data (as with some salary structures). By removing those values, you
are removing an important aspect of the distribution.
One possible solution to the problem of outliers is to perform two analy-
ses: one with the outliers and one without. If your conclusions are the same,
you can be confident that the outlier had no effect. If the results are ex-
tremely different, you can report both answers with an explanation of the



Free download pdf