levels) on one or both measures. We may need to be somewhat cautious in our interpreta-
tion, and there are some interesting relationships between those correlations and other sta-
tistics we have discussed, but the same basic procedure is used for these special cases as
we used for the more general case.
Point-Biserial Correlation ( )
Frequently, variables are measured in the form of a dichotomy,such as male-female, pass-
fail, Experimental group-Control group, and so on. Ignoring for the moment that these vari-
ables are seldom measured numerically (a minor problem), it is also quite apparent that
they are not measured continuously. There is no way we can assume that a continuous dis-
tribution, such as the normal distribution, for example, will represent the obtained scores
on the dichotomous variable male-female. If we wish to use ras a measure of relationship
between variables, we obviously have a problem, because for rto have certain desirable
properties as an estimate of , we need to assume at least an approximation of normality in
the joint (bivariate) population of Xand Y.
The difficulty over the numerical measurement of Xturns out to be trivial for dichoto-
mous variables. If Xrepresents married versus unmarried, for example, then we can legiti-
mately score married as 0 and unmarried as 1, or vice versa. (In fact anytwo values will
do. Thus all married subjects could be given a score of 7 on X, while all unmarried subjects
could receive a score of 18, without affecting the correlation in the least. We use 0 and 1,
or sometimes 1 and 2, for the simple reason that this makes the arithmetic easier.) Given
such a system of quantification, it should be apparent that the sign of the correlation will
depend solely on the arbitrary way in which we choose to assign 0 and 1, and is therefore
meaningless for most purposes.
If we set aside until the end of the chapter the problem of ras an estimate of , things
begin to look brighter. For any other purpose, we can proceed as usual to calculate the stan-
dard Pearson correlation coefficient (r), although we will label it thepoint-biserial coeffi-
cient ( ).Thus, algebraically, , where one variable is dichotomous and the other
is roughly continuous and more or less normally distributed in arrays.^1 There are special
formulae that we could use, but there is nothing to be gained by doing so and it is just
something additional to learn and remember.
Calculating
One of the more common questions among statistical discussion groups on the Internet is
“Does anyone know of a program that will calculate a point-biserial correlation?” The an-
swer is very simple—any statistical package I know of will calculate the point-biserial cor-
relation, because it is simply Pearson’s rapplied to a special kind of data.
As an example of the calculation of the point-biserial correlation, we will use the data
in Table 10.1. These are the first 12 cases of male (Sex 5 0) weights and the first 15 cases
of female (Sex 5 1) weights from Exercises 9.31 and 9.32 in Chapter 9. I have chosen un-
equal numbers of males and females just to show that it is possible to do so. Keep in mind
that these are actual self-report data from real subjects.
The scatterplot for these data is given in Figure 10.1, with the regression line superim-
posed. There are fewer than 27 data points here simply because some points overlap.
Notice that the regression line passes through the mean of each array. Thus, when X 5 0,
YNis the intercept and equals the mean weight for males, and when X 5 1, is the meanYN
rpb
rpb rpb=r
r
r
rpb
Section 10.1 Point-Biserial Correlation and Phi: Pearson Correlations by Another Name 295
(^1) When there is a clear criterion variable and when that variable is the one that is dichotomous, you might wish to
consider logistic regression (see Chapter 15).
dichotomy
point-biserial
coefficient (rpb)