The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
Solutions to Selected Exercises 183

variable, then the regression of Y on the predictor vector X = ( X 1 , X 2 ,... , X k ) is
the probability that Y = 1 given X = x. We let π ( x ) = E [ Y | X = x ]. The logistic
regression expresses g ( x ) = the logit function of π ( x ), namely g ( x ) = ln[ π ( x )/
{1 − π ( x )}], with the function g linear in the prediction vector x. Here the coef-

fi cient of each X i = x i is β (^) i , i = 1, 2,... , k. It differs from ordinary linear regression
in that E ( Y | X = x ) is a probability belonging to the interval [0, 1], and the logit
transformation is used to transform it to ( − ∞ , ∞ ). It is linear like ordinary linear
regression, but only after the logit transformation.



  1. What is the defi nition of the multiple correlation coeffi cient R 2?
    For multiple linear regression, the multiple correlation coeffi cient is the proportion
    of the variance in Y that is explained by the estimated regression equation divided
    by the total variance in Y. It is a measure of goodness of fi t to the line, and when
    R 2 = 1, the regression equation explains all of the variance implying that all the
    data fall exactly on the regression line.

  2. What is the equivalent to R 2 in simple linear regression?
    In simple linear regression, the square of the Pearson correlation coeffi cient is
    analogous to R 2. The square of the correlation is the percentage of the variance in
    Y explained by the variable X. When the data fall perfectly on a line the correlation
    equals ± 1, and its square equals 1.

  3. What is stepwise regression? Why is it used?
    Stepwise regression is a procedure for selecting a subset of a set of proposed
    predictor variables to include in the regression model. It uses criteria to either add
    a variable or subtract a variable at any stage until there are no more variables
    satisfying the drop or add criterion. Stepwise regression is used because often in
    practice we know a set of variables that are related to the response variable, but
    we don ’ t know how correlated they are among each other. When there is correla-
    tion, a subset of the variables will do better at predicting new values for the
    response than the full set of variables. This is because the estimated coeffi cients
    can be unstable when there is correlation among the variables. This is called the
    multicollinearity problem, because high correlation means that one of the predic-


tor variable, say X (^) 1, is nearly expressible as a linear combination of other predictor
variables. If the multiple correlation between the variables and X 1 the regression
coeffi cients are not unique.



  1. An experiment was conducted to study the effect of increasing the dosage of
    a certain barbiturate. Three readings were recorded at each dose. Refer to
    Table 7.6.
    (a) Plot the scatter diagram (scatter plot)
    (b) Determine by least squares the simple linear regression line relating
    dosage X to sleeping time Y.
    (c) Provide a 95% two - sided confi dence interval for the slope.
    (d) Test that there is no linear relationship at the 0.05 level.


bboth.indd 183oth.indd 183 6 6/15/2011 4:08:22 PM/ 15 / 2011 4 : 08 : 22 PM

Free download pdf