title 'FSME vs CAT';
run;
Looking at the plot there is a suggestion that the data are related in a positive way, that is
as FSME scores increase so do CAT scores. It is not absolutely clear that the relationship
is linear although this is plausible. One school with the scores 23.1 and 35.1 seems as
though it may be an outlier, but with so few points plotted, it is difficult to discern
whether this point is apart from the main cluster of points and the upward trend.
Interpreting such plots is as much an art as a science and it is difficult to be definitive
with so few points plotted.
Given that there is a question about the linearity, there are only ten data points, there
may be an outlier and most importantly the data are ranked (percentages), then
Spearman’s correlation should be used. This example is given to illustrate the steps to be
taken when carrying out a correlational analysis. Knowing that the data is ranked would
certainly suggest a rank order correlation. It is always recommended to plot the data prior
to any significance testing.
Spearman’s correlation rs is calculated by applying Pearson’s original formula for r to
the ranked data:
Spearman’s
rank order
correlation—
7.1
where Ri corresponds to the rank value of an x value (xi) and Si corresponds to the rank
value of a y value (yi). and are the corresponding means of the rank values. In the
case of ties, average rank values are used.
There are alternative formulae for example:
where D is the difference between x and y ranks assigned to each observation and n is the
number of observations. It is seldom mentioned that this formulae assumes there are no
ties in the data. The effect of ties in the data is to inflate the correlation and a correction
should be made. It is therefore suggested that equation 7.1 be used as this is applicable
whether or not there are ties. Most computer packages allow for ties in the data and make
the necessary adjustments. In the SAS procedure PROC CORR averaged rank values are
used in the case of tied ranks.
The steps in computing rs using formula 7.1 are:
1 Rank each variable, for example, FSME and CAT separately (two series of rank
scores), assign a rank of 1 to the smallest value.
2 For each subject compute and ...
These computations are shown in Table 7.2.
Inferences involving rank data 211