Introductory Biostatistics

(Chris Devlin) #1

8.2 MULTIPLE REGRESSION ANALYSIS


The e¤ect of some factor on a dependent or response variable may be influ-
enced by the presence of other factors because of redundancies or e¤ect mod-
ifications (i.e., interactions). Therefore, to provide a more comprehensive anal-
ysis, it may be desirable to consider a large number of factors and sort out
which ones are most closely related to the dependent variable. In this section
we discuss a multivariate method for this type of risk determination. This
method, which is multiple regression analysis, involves a linear combination of
the explanatory or independent variables, also calledcovariates; the variables
must be quantitative with particular numerical values for each subject in the
sample. A covariate or independent variable, such as a patient characteristic,
may be dichotomous, polytomous, or continuous (categorical factors will be
represented by dummy variables). Examples ofdichotomous covariates are
gender and presence/absence of certain comorbidity. Polytomous covariates
include race and di¤erent grades of symptoms; these can be covered by the use
ofdummy variables.Continuous covariatesinclude patient age and blood pres-
sure. In many cases, data transformations (e.g., taking the logarithm) may be
needed to satisfy the linearity assumption.


8.2.1 Regression Model with Several Independent Variables


Suppose that we want to considerkindependent variables simultaneously. The
simple linear model of Section 8.1 can easily be generalized and expressed as


Yi¼b 0 þ

Xk

j¼ 1

bjxjiþei

whereYiis the value of the response or dependent variable from theith subject;
b 0 ;b 1 ;...;bkare thekþ1 unknown parameters (b 0 is the intercept and thebi’s
are the slopes, one for each independent variables);Xijis the value of thejth
independent variable (j¼1tok) from theith subject (i¼1ton); andeiis a
random error term which is distributed as normal with mean zero and variance
s^2 , so that the mean ofYiis


mi¼b 0 þ

Xk

j¼ 1

bjxji

The model above is referred to as the multiple linear regression model.It
ismultiplebecause it contains several independent variables. It is stilllinear
because the independent variables appear only in the first power; this feature is
rather di‰cult to check because we do not have a scatter diagram to rely on as
in the case of simple linear regression. In addition, the model can be modified
to include higher powers of independent variables as well as their various
products, as we’ll see subsequently.


294 CORRELATION AND REGRESSION

Free download pdf