For computational purposes, a simple expression for the covariance is given by
For the full data set represented in abbreviated form in Table 9.2, the covariance is
9.4 The Pearson Product-Moment Correlation Coefficient (r)
What we said about the covariance might suggest that we could use it as a measure of the
degree of relationship between two variables. An immediate difficulty arises, however, be-
cause the absolute value of is also a function of the standard deviations of Xand Y.
Thus, a value of , for example, might reflect a high degree of correlation
when the standard deviations are small, but a low degree of correlation when the standard
deviations are high. To resolve this difficulty, we divide the covariance by the size of the
standard deviations and make this our estimate of correlation. Thus, we define
Since the maximum value of can be shown to be , it follows that the limits
on rare One interpretation of r, then, is that it is a measure of the degree to which
the covariance approaches its maximum.
From Table 9.2 and subsequent calculations, we know that and ,
and Then the correlation between Xand Yis given by
This coefficient must be interpreted cautiously; do not attribute meaning to it that it
does not possess. Specifically, r 5 .53 should notbe interpreted to mean that there is 53%
of a relationship (whatever that might mean) between stress and symptoms. The correla-
tion coefficient is simply a point on the scale between 2 1 and 1, and the closer it is to ei-
ther of those limits, the stronger is the relationship between the two variables. For a more
specific interpretation, we can speak in terms of , which will be discussed shortly. It is
important to emphasize again that the sign of the correlation merely reflects the direction
of the relationship and, possibly, the arbitrary nature of the scale. Changing a variable from
“number of items correct” to “number of items incorrect” would reverse the sign of a cor-
relation, but it would have no effect on its absolute value.
Adjusted r
Although the correlation we have just computed is the one we normally report, it is not an
unbiased estimate of the correlation coefficient in the population,denoted (r) rho.To
see why this would be the case, imagine two randomly selected pairs of points—for example,
r^2
r=
1.336
(12.290)(0.202)
=.529
r=
covXY
sXsY
covXY=1.336.
sX=12.492 sY=0.202
6 1.00.
covXY 6 sXsY
r=
covXY
sXsY
covXY=1.336
covXY
covXY=
10353.66 2
(2278)(479.668)
107
106
=
10353.66 2 10211.997
106
=1.336
covXY=
aXY^2
gXgY
N
N 21
252 Chapter 9 Correlation and Regression
correlation
coefficient in the
population (r) rho