Statistical significance is a good thing, but it certainly isn’t everything. Percentage of vari-
ation is an important kind of measure, but it is not very intuitive and may be small in im-
portant situations. The d-family measures of effect sizes have the advantage of presenting a
difference in concrete terms (distance between means in terms of standard deviations).
Odds ratios and risk ratios are very useful when you have a 2 3 2 table, but less so with
more complex or with simpler situations.
10.2 Biserial and Tetrachoric Correlation: Non-Pearson Correlation Coefficients
In considering the point-biserial and phi coefficients, we were looking at data where one or
both variables were measured as a dichotomy. We might even call this a “true dichotomy”
because we often think of those variables as “either-or” variables. A person is a male or a
female, not halfway in between. Those are the coefficients we will almost always calculate
with dichotomous data, and nearly all computer software will calculate those coefficients
by default.
Two other coefficients, to which you are likely to see reference, but are most unlikely
to use, are the biserial correlationand the tetrachoric correlation.In earlier editions of
this book I showed how to calculate those coefficients, but there does not seem to be much
point in doing so anymore. I will simply explain how they differ from the coefficients I
have discussed.
As I have said, we usually treat people as male or female, as if they pass or they fail a
test, or as if they are abused or not abused. But we know that those dichotomies, especially
the last two, are somewhat arbitrary. People fail miserably, or barely fail, or barely pass,
and so on. People suffer varying degrees of sexual abuse, and although all abuse is bad,
some is worse than others. If we are willing to take this underlying continuity into account,
we can make an estimate of what the correlation would have been if the variable (or
variables) had been normally distributed instead of dichotomously distributed.
The biserial correlation is the direct analog of the point-biserial correlation, except that
the biserial assumes underlying normality in the dichotomous variable. The tetrachoric cor-
relation is the direct analog of , where we assume underlying normality on both variables.
That is all you really need to know about these two coefficients.
10.3 Correlation Coefficients for Ranked Data
In some experiments, the data naturally occur in the form of ranks. For example, we might
ask judges to rank objects in order of preference under two different conditions, and wish
to know the correlation between the two sets of rankings. Cities are frequently ranked in
terms of livability, and we might want to correlate those rankings with rankings given
10 years later. Usually we are most interested in these correlations when we wish to assess
the reliability of some ranking procedure, though in the case of the city ranking example,
we are interested in the stability of rankings.
A related procedure, which has frequently been recommended in the past, is to rank
sets of measurement data when we have serious reservations about the nature of the under-
lying scale of measurement. In this case, we are substituting ranks for raw scores. Although
we could seriously question the necessity of ranking measurement data (for reasons men-
tioned in the discussion of measurement scales in Section 1.3 of Chapter 1), this is
nonetheless a fairly common procedure.
f
Section 10.3 Correlation Coefficients for Ranked Data 303
biserial
correlation
tetrachoric
correlation