The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
182 Solutions to Selected Exercises

(e) Nonlinear regression
(f) Scatter plot
(g) Slope of the regression line in simple linear regression
(a) Association is a general term for a relationship between two variables. It
includes the Pearson correlation coeffi cient, Kendall ’ s tau, and Spearman ’ s rho
among others.
(b) The correlation coeffi cient usually refers to the Pearson product moment cor-
relation, which is a measure of the strength of the linear association between
two variables. Sometimes Kendall ’ s tau and Spearman ’ s rho are also called
correlations. Spearman ’ s rank correlation measures the degree to which X
increases as Y increase or X decreases as Y increases. It is 1 when Y is exactly
a monotonically increasing function of X , and − 1 when Y is exactly a mono-
tonically decreasing function of X.
(c) Simple linear regression is the curve relating two variables X and when
Y = af ( X ) + b + e , where a and b are the parameters, and e represents a random
noise component. In this formulation, Y is a linear function of the parameters
a and b , and f is any function of X. The regression function is af ( X ) + b. If
f ( X ) = X , Y is linear in X also, but f ( X ) could also be X or X 2 or log( X ).
(d) Multiple linear regression is similar to linear regression except that Y is a
function of two or more variables X 1 , X 2 ,... X n , where n ≥ 2.
(e) Nonlinear regression can have Y be a function of one or more variables. It
differs from linear regression in that it must be nonlinear in parameters. So, for
example, Y could be exp( b ) X a , or some other complicated expression. Y = X a + e
is nonlinear. But if the noise term were multiplicative, that is, Y = X a e , then it is
transformable to a linear regression, since ln( Y ) = ln( e ) + a ln( X ) In this case, we
can solve by least squares with a zero intercept restriction. ln( e ) is the additive
noise term, and Z = ln( Y ) has a linear regression Z = aW + δ , where W = ln( X )
and δ = ln( e ). The only parameter now is a , and Z is a linear function of the
parameter a. Usually, in nonlinear regression, iterative procedures are needed
for the solution, while in linear regression, the least squares solution is obtained
in closed form by solving equations that are called the normal equations.
(f) A scatter plot is a graph of pairs ( X , Y ) that graphically shows the degree of
relationship between the variables X and Y and is often the fi rst step toward
fi tting a model of Y as a function of X.
(g) In simple linear regression, where Y = af ( X ) + b + e. The parameter a is called the
slope of the regression line. When f ( X ) = X , the least squares regression line is fi t
through the scatter plot of the data. The closer the data points fall near the least
squares line the higher is the correlation between X and Y , and the better the linear
regression line fi ts the data. The slope of that regression line is the least squares
estimate of a , and the Y intercept for the line is the least squares estimate of b.


  1. What is logistic regression? How is it different from ordinary linear
    regression?
    Logistic regression involves a response variable that is binary. The predictor vari-
    ables can be continuous or discrete or a combination of both. Call Y the binary


bboth.indd 182oth.indd 182 6 6/15/2011 4:08:22 PM/ 15 / 2011 4 : 08 : 22 PM

Free download pdf