Basic Statistics

(Barry) #1

58 MEASURES OF LOCATION AND VARIABILITY


5.4.2


How clean is the data set? If the data set is likely to include outliers that are not from
the intended population (either blunders, errors, or extreme observations), statistics
that are resistant to these problems should be considered. In this chapter we have
discussed the use of medians and quartiles, two statistics that are recommended for
dealing with questionable data sets. Note that the mean and particularly the standard
deviation is sensitive to observations whose numerical value is distant from the mean
of the remaining values.
If an observation is a known error or blunder, most investigators will remove the
observation or attempt to retake it, but often it is difficult to detect errors unless they
result in highly unusual values.


Relating Statistics and Data Quality

5.4.3

Measurements can be classified by type and then, depending on the type, certain
statistics are recommended. Continuous measurements were discussed in Chapter 4.
Another commonly used system is that given by Stevens. In Stevens’ system, mea-
surements are classified as nominal, ordinal, interval, or ratio based on what trans-
formations would not change their classification. Here, we simply present the system
and give the recommended graphical diagrams and statistics. We do not recommend
that this be the sole basis for the choice of statistics or graphs, but it is an important
factor to consider.
In Stevens’ system, variables are called nominal if each observation belongs to one
of several distinct categories that can be arranged in any order. The categories may or
may not be numerical, although numbers are generally used to represent them when
the information is entered into the computer. For example, the type of maltreatment
of children in United States is classified into neglect, physical abuse, sexual abuse,
emotional abuse, medical neglect, and others. These types could be entered into a
statistical package using a word for each type or simply could be coded 1, 2, 3, 4, 5.
or 6. But note that there is no underlying order. We could code neglect as a 1 or a 2
or a 3, and so on; it makes no difference as long as we are consistent in what number
we use. Other nominal variables include gender, race, or illnesses.
If the categories have an underlying order (can be ranked), the variable is said
to be ordinal. An example would be classification of disease condition as none,
mild, moderate, or severe. A common ordinal measure of health status is obtained
by asking respondents if their health status is poor, fair, good, or excellent. When
the information for this variable is entered into the computer, it will probably be
coded 1, 2, 3, or 4. The order of the numbers is important, but we do not know if
the difference between poor health and fair health is equivalent in magnitude to the
difference between fair health and good health. With ordinal data we can determine
whether one outcome is greater than or less than another, but the magnitude of the
difference is unknown.
An interval data variable is not only ordered but has equal intervals between
successive values. For example, temperature in degrees on a Fahrenheit or Celsius


Relating Statistics to theType of Data
Free download pdf