The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
96 CHAPTER 7 Correlation, Regression, and Logistic Regression

dependent variable conditional on the independent variables (which can
be discrete or continuous) is a probability p , with possible values on the
interval [0, 1]. Logistic regression is covered separately in Section 7.6.

7.1 RELATIONSHIP BETWEEN TWO VARIABLES


AND THE SCATTER PLOT


The Pearson correlation coeffi cient that we will discuss in Section 7.2
measures linear association. While it may detect some forms of curved
relationships, it is not the best measure for those associations. The
linear association may be positive as in the equation

YX=− 5 10. (7.1)

Here X and Y are related with a positive slope of and a Y intercept
of − 10. We will see that this relationship with the addition of an inde-
pendent random component will give a positive correlation. This simply
means that as X increases, Y tends to increase. If Equation 7.1 held
exactly, we would drop the word “ tends. ” However, the addition of a
random component means that if the random component is negative,
the observed value of Y at X = X 1 could be smaller than the observed
value of Y at X 0 , where X 0 < X 1. In most cases, the data will not fall
perfectly on a straight line, and so we defi ne the difference Y − Y


^
to be
the residual at X. For example, if the fi tted line happens to be
Y


^
= 3.5 X + 2, and at X = 2, we observe Y = 8.7, then Y − Y

^
= 8.7 −
(3.5(2) + 2 ) = 8.7 − 9 = − 0.3. So the residual at X = 2 is − 0.3. For all
the data point ( X i , Y i ) for i = 1, 2,... n , we compute the residuals. We
then square the residuals and take their sum. This is called the mean
square error. Note that in this case, the slope “ b ” for the fi tted line is
3.5, and the intercept “ a ” is 2. Had we used a different value for “ b ”
and “ a , ” we would have gotten different residuals and hence a different
mean square error. The method of least squares is a common way to fi t
“ b ” and “ a. ” It simply amounts to fi nding the value of “ b ” and “ a ” that
makes the mean square error the smallest. This minimum is unique in
many instances, and the resulting values for “ b ” and “ a ” are called the
least squares estimates of the slope and intercept, respectively. Note
that this minimum will be greater than 0 unless all the points fall exactly
on a straight line.

Free download pdf