Solutions to Selected Exercises 183
variable, then the regression of Y on the predictor vector X = ( X 1 , X 2 ,... , X k ) is
the probability that Y = 1 given X = x. We let π ( x ) = E [ Y | X = x ]. The logistic
regression expresses g ( x ) = the logit function of π ( x ), namely g ( x ) = ln[ π ( x )/
{1 − π ( x )}], with the function g linear in the prediction vector x. Here the coef-
fi cient of each X i = x i is β (^) i , i = 1, 2,... , k. It differs from ordinary linear regression
in that E ( Y | X = x ) is a probability belonging to the interval [0, 1], and the logit
transformation is used to transform it to ( − ∞ , ∞ ). It is linear like ordinary linear
regression, but only after the logit transformation.
- What is the defi nition of the multiple correlation coeffi cient R 2?
For multiple linear regression, the multiple correlation coeffi cient is the proportion
of the variance in Y that is explained by the estimated regression equation divided
by the total variance in Y. It is a measure of goodness of fi t to the line, and when
R 2 = 1, the regression equation explains all of the variance implying that all the
data fall exactly on the regression line. - What is the equivalent to R 2 in simple linear regression?
In simple linear regression, the square of the Pearson correlation coeffi cient is
analogous to R 2. The square of the correlation is the percentage of the variance in
Y explained by the variable X. When the data fall perfectly on a line the correlation
equals ± 1, and its square equals 1. - What is stepwise regression? Why is it used?
Stepwise regression is a procedure for selecting a subset of a set of proposed
predictor variables to include in the regression model. It uses criteria to either add
a variable or subtract a variable at any stage until there are no more variables
satisfying the drop or add criterion. Stepwise regression is used because often in
practice we know a set of variables that are related to the response variable, but
we don ’ t know how correlated they are among each other. When there is correla-
tion, a subset of the variables will do better at predicting new values for the
response than the full set of variables. This is because the estimated coeffi cients
can be unstable when there is correlation among the variables. This is called the
multicollinearity problem, because high correlation means that one of the predic-
tor variable, say X (^) 1, is nearly expressible as a linear combination of other predictor
variables. If the multiple correlation between the variables and X 1 the regression
coeffi cients are not unique.
- An experiment was conducted to study the effect of increasing the dosage of
a certain barbiturate. Three readings were recorded at each dose. Refer to
Table 7.6.
(a) Plot the scatter diagram (scatter plot)
(b) Determine by least squares the simple linear regression line relating
dosage X to sleeping time Y.
(c) Provide a 95% two - sided confi dence interval for the slope.
(d) Test that there is no linear relationship at the 0.05 level.
bboth.indd 183oth.indd 183 6 6/15/2011 4:08:22 PM/ 15 / 2011 4 : 08 : 22 PM