The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
100 CHAPTER 7 Correlation, Regression, and Logistic Regression

Here, Y

^
and X

^
are their respective sample means, and X i and
Y i are the respective blood pressure readings for the i th subject.
This formula is best for understanding the meaning because it shows
r to be the ratio of the sample estimate of the covariance between
X and Y divided by the square root of the product of their
variances. To see it, divide numerator and denominator by n. In
the denominator, rewrite n as nn. Put one n under the sums
involving X , and one under the sums involving Y. Then the
denominator is an estimate of the square root of the product of
sample variances, and the numerator is a sample estimate of the
covariance.
A more complicated computational formula calculates r faster, and
is mathematically equivalent to the expression above. Both r and ρ have
the property that they can take on any value in [ − 1, 1], but cannot take
on a value outside that interval.
One common hypothesis test that is conducted when the data is
suspected to be correlated is to test that the population correlation coef-
fi cient ρ = 0 versus the two - sided alternative that ρ ≠ 0. This test is the
same as the test that the slope of the regression line is 0. Under the


null hypothesis that ρ = 0, the quantity rn()/−− 21 r^2 ) has a t -
distribution with n − 2 degrees of freedom.
In this case, if we reject the null hypothesis of no correlation,
we can conclude that the two variables are related. But it does not
address the issue of why they are related. Often, we study relationships
between variables because we suspect a causal link. The simple test
for correlation cannot provide us with information on causation.
Sound theory is required to make the causal link. In many situations,
we must at least have the value for X occur before Y in time for X
to be able to cause Y , and in such situations, we can rule out the
possibility that Y causes X. Over the past 20 years, a great deal
of research in statistical modeling has led to advances in fi nding
models and plausible assumptions where a signifi cant relationship
can imply a causal relationship. In this branch of statistics, which is
sometimes called causal inference, the names of Pearl, Rubin, and
Robins stands out. Some articles that may be of interest are Robins
( 1999 ), Hern á n et al. ( 2000 ), and Hern á n et al. ( 2005 ). There are also
the books, Pearl ( 2000, 2009 ), Rubin ( 2006 ), and van der Laan and
Robins ( 2003, 2010 ).

Free download pdf