Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1
Deciding which Summary Measure to Use

We now have considered descriptive statistics for measures of central tendency and
measures of dispersion. At this point you may well be wondering which statistic to use
and when. When choosing a statistic you should consider: the properties of the statistic as
an estimator, that is whether the statistic is a biased, efficient, sufficient and resistant
estimator; level of measurement of a variable; and subsequent inferential analyses.


Measures of central tendency

The sample mean is generally more widely used as a descriptive statistic than either the
median or the mode. The mean is unbiased, efficient, sufficient but not resistant. The
median is also unbiased, sufficient but less efficient than the mean. It has the advantage
however of being more resistant than the mean. It is often stated in statistical textbooks
that the mean should not be used with nominal or ordinal data. This is not true. For
nominal data which has values 0 and 1, (say females are coded 0 and males 1) then the
mean is simply equal to the proportion of males in the distribution. The mean may even
be used with ordered categorical data. An implicit assumption however would be that the
change from say 1 to 2 would be the same amount as the change from 2 to 3. Should this
assumption seem unrealistic do not use the mean.
In my view the mean is used more often than it should be and is widely
misunderstood. Chatfield (1993) illustrates misunderstanding about the mean by
reference to, ‘the apocryphal story of the politician who said that it was disgraceful for
half the nation’s children to be under average intelligence’ (p. 33). When the mean is
used it should be accompanied by the standard deviation.
The mean should not be used when a distribution is skewed, instead use the median
and interquartile range. Another situation when the mean should not be used is when data
is censored. Educational researchers frequently ask whether and if so when events occur.
For example, in a study of teachers’ careers, the response variable of interest might be
‘survival time’, that is how long it is before a teacher quits teaching. A problem is that, no
matter how long a follow-up study lasts, some teachers may not quit teaching. These
observations are censored, the researcher does not know when, if ever, the teacher will
quit teaching, in that sense the data is incomplete. For discussion of alternative strategies
to summarize survival time data see a paper on discrete-time survival analysis by Singer
and Willett (1993).


Measures of dispersion

The range is influenced by the sample size and interpretation is problematic. The sample
standard deviation and variance are biased (but can be adjusted by appropriate df) and
sufficient but are not resistant statistics. The interquartile range is more resistant than
either the standard deviation or variance but is less efficient. The standard deviation is
most useful with approximately normal data; when data is skewed, the interquartile range
is more appropriate.


Initial data analysis 77
Free download pdf