Basic Statistics

(Barry) #1
THE CORRELATION COEFFICIENT FOR TWO VARIABLES FROM A SINGLE SAMPLE 179

12.3.3 The Population Correlation Coefficient

The observations are assumed to come from a single simple random sample of n
individuals from a population with X and Y bivariately normally distributed. Each
individual in the population has a value of X and a value of Y (weight and systolic
blood pressure, for example). We can think of this population as having a correlation
coefficient just like that of a sample. The population coefficient of correlation will be
denoted by the Greek letter p, with r kept for the sample correlation coefficient.
Just as for other population parameters such as means and slopes, it is possible to
obtain confidence intervals for the population p and to compute tests of hypotheses.

12.3.4 Confidence Intervals for the Correlation Coefficient

A correlation coefficient calculated from a small sample may be quite misleading, so
that confidence intervals for the population correlation coefficient are useful. There
are various methods for obtaining such confidence intervals; here only one graphical
method is given.
Table A.6 gives a chart from which 95% confidence intervals can be read with
accuracy sufficient for most purposes. For the weight and systolic blood pressure
example, r = .857. To find the 95% confidence interval, the point .857 is found on
the horizontal scale at the bottom of the chart. We then look directly upward in a
vertical line until we find a curve labeled 10, the sample size. We then read the value
from the vertical scale on the left or on the right of the chart corresponding to the point
on the curve labeled 10 just above .857. This value is approximately SO, which is the
lower confidence limit. For the upper confidence limit, we find .857 on the horizontal
scale across the top of the chart and look directly down to the upper curve labeled 10.
Reading across, the approximate point .97 is found, which is the upper confidence
limit. Thus the 95% confidence interval for p is approximately .50 < p < .97, a
rather wide range. Because the lower and upper confidence limits are both positive,
we conclude that the population correlation coefficient is positive.
If the sample size is not listed on the chart, conservative intervals can be obtained
by using the next smaller sample size that is given on the chart. Alternatively, we can
interpolate roughly between two of the curves. For example, if our sample size were
11, we could either use a sample size of 10 or interpolate halfway between 10 and
12 in Table A.6. More accurate estimates of the confidence intervals can be obtained
using a method given by Mickey et al. [2004].
The confidence intervals calculated in this section can be used to test HO : p = po,
where po is any specified value between -1 and fl. For the sample of size 10 with
r = .857, we might wish to test HO : p = .7. Since .7 is contained in the 95%
confidence interval +.50 to +.97, the null hypothesis is not rejected.


12.3.5 Test of Hypothesis That p = 0


The most commonly used test is of HO : p = 0: in other words, a test of no association
between X and Y. We can make this test by using the test statistic
r
t=
J(1 - r2)/(n - 2)
Free download pdf