Basic Statistics

(Barry) #1
THE CORRELATION COEFFICIENT FOR TWO VARIABLES FROM A SINGLE SAMPLE 177

12.3 THE CORRELATION COEFFICIENT FOR TWO VARIABLES
FROM A SINGLE SAMPLE

When the relation between two variables from a single sample is studied, it often seems
desirable to have some way of measuring the degree of association or correlation
between them. For categorical data in Section 1 1.2.2, we presented the odds ratio as
a measure of association. Here, for continuous data interval or ratio data, the most
widely used measure of association is the correlation coefficient r.


12.3.1 Calculation of the Correlation Coefficient

The definition of the correlation coefficient is

E(X - X)(Y - L)
r=
&X - X)2 C(Y - Y)2

Before discussing the meaning of r, we shall illustrate its calculation from the
calculations done for the regression line in Table 12.2. From the table, we have
c(X - X)(Y - 7) = 2097.3, c(X - x)' = 7224.1, and c(Y - Y)' = 828.9.
Substituting these numerical values in the equation for r, we have


2097.3
J7224.1(828.9)

r=

or
2097.3
$3,988,056.5 2447.05





    • ~ = .857
      2097.3
      r=




Thus, the correlation between weight and systolic blood pressure for the 10 adult
males is .857.


12.3.2 The Meaning of the Correlation Coefficient


As was the case for the slope coefficient b, the sign of r is determined by its numerator,
C(X - x)(Y - Y), since the denominator is always positive. Thus, if values of
X greater than x occur when values of Y are greater than and small values of
X occur when small values of Y occur, the value of the T will be positive. We then
have a positive relationship. In the example with the 10 males, r is positive, so that
large values of weight are associated with large systolic blood pressure. If we had
computed a negative r, high values of Y would occur with low values of X. In the
example of this given earlier, lower vital capacity tends to occur with older age.
It can be shown that r always lies between -1 and +1. Indeed, if all the data
points lie precisely on a straight line with a negative slope, the correlation coefficient
is always exactly - 1. If all the points lie on a straight line with positive slope, r = + 1.
Figure 12.4 illustrates these possibilities.
A correlation coefficient of 0 is interpreted as meaning that there is no linear relation
between the two variables. Figure 12.5 illustrates data with zero correlations. Note

Free download pdf