Introductory Biostatistics

(Chris Devlin) #1

assumed to be normally distributed and amodelis formulated to express the
meanof this normal distribution as a function of potential independent vari-
ables under investigation. The dependent variable is denoted byY, and the study
often involves a number ofrisk factorsorpredictor variables:X 1 ;X 2 ;...;Xk.


8.1 SIMPLE REGRESSION ANALYSIS


In this section we discuss the basic ideas of simple regression analysis when
only one predictor or independent variable is available for predicting the
response of interest. In the interpretation of the primary parameter of the
model, we discuss both scales of measurement, discrete and continuous, even
though in practical applications, the independent variable under investigation is
often on a continuous scale.


8.1.1 Simple Linear Regression Model


Choosing an appropriate model and analytical technique depends on the type
of variable under investigation. In a variety of applications, the dependent
variable of interest is a continuous variable that we can assume may, after an
appropropriate transformation, be normally distributed. Theregression model
describes themeanof that normally distributed dependent variableY as a
function of the predictor or independent variableX:


Yi¼b 0 þb 1 xiþei

whereYiis the value of the response or dependent variable from theith pair,b 0
andb 1 are the two unknown parameters,xiis the value of the independent
variable from theith pair, andeiis a random error term which is distributed as
normal with mean zero and variances^2 , so thatðb 0 þb 1 xiÞis the meanmiof
Yi. The model above is referred to as thesimple linear regression model.Itis
simplebecause it contains only one independent variable. It islinearbecause
the independent variable appears only in the first power; if we graph the mean
ofYversusX, the graph is astraight linewithinterceptb 0 andslopeb 1.


8.1.2 Scatter Diagram


As mentioned above, and stipulated by thesimple linear regression model,ifwe
graph the mean ofYversusX, the graph is astraight line. But that is the line
for the means ofY; at each level ofX, theobserved valueofYmay exceed or
fall short of its mean. Therefore, when we graph theobservedvalue ofYversus
X, the points do not fall perfectly on any line. This is an important character-
istic forstatisticalrelationships, as pointed out in Chapter 2. If we let each pair
of numbersðx;yÞbe represented by a dot in a diagram with thex’s on the


SIMPLE REGRESSION ANALYSIS 283
Free download pdf