Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

median is the average of these two values which is 18.9. This agrees with our earlier
calculation.
To find the quartiles we use a similar approach which is described in detail in the
Open University Statistics in Society course text Unit A1, (Open University Press, 1986).
The method is summarized here. The lower quartile is found by counting up from the
lowest value a defined number of places. This is the depth of the lower quartile. The
depth of the upper quartile is found in the same manner but counting down from the
highest value. What we have to find is the correct depth. The depth will be approximately
at 1/4 of n, the number of values in the distribution. The precise value will depend upon
whether n is exactly divisible by 4. A general rule for calculating the quartiles is:


When n/4 is not an integer, the quartiles are the values whose depth is the
next whole number larger than n/4.
When n/4 is an integer, the quartiles are the average of the two values
at depth n/4 and depth (n/4 +1). The upper quartile is found by counting
down from the highest value and the lower quartile is found by counting
up from the lowest value.
How do we interpret the interquartile range, how large is large?

The inter-quartile range only really makes any sense when it is compared to the median
and when the number of values in the distribution is known.
To calculate the lower quartile for the data shown in Figure 3.12 we first need to
divide n, that is 114 by 4. This gives the value 28.5 which is not an integer. We next
identify the next whole number larger than 28.5, that is 29. The quartiles are at the 29th
value in from the extremes of the distribution. The lower quartile is therefore 18.6 and the
upper quartile is 19.8. This simple method gives similar results to more exact methods,
the more precise values calculated in SAS are, Q 1 =18.6, and Q 3 =19.5.
The most widely used measure of dispersion is the standard deviation. This statistic
measures the dispersion of scores around the mean. If all the values in a distribution were
the same, each value would equal the mean, there would be no dispersion and none of the
values would deviate from the mean and the standard deviation would be zero. The more
values that deviate from the mean, that is the greater the variation around the mean, the
greater is the value of the standard deviation.
A sample standard deviation is calculated by finding the deviation of each score from
the mean, that is by subtracting the mean from each score. If each deviation from the
mean is then squared, add up all the squared deviations and then divide by the number of
values in the distribution less 1, we have calculated a statistic called the variance.
Unfortunately, the variance is in squared units (remember we squared the deviations of
each value from the mean) and is therefore not in the same units of measurement as the
data. To return to the original units of measurement we need to take the square root of the
variance, this new statistic is the standard deviation.
Instead of explaining in a verbose way how to calculate the standard deviation we can
express these arithmetic manipulations succinctly using a kind of shorthand or algebraic
notation. If you are unfamiliar or cannot remember how to use notation, a brief review is
presented in Appendix A2. If you have difficulty with the following calculation you
should read through this appendix. The following formulas differ from the explanatory


Initial data analysis 69
Free download pdf