Introductory Biostatistics

(Chris Devlin) #1

study, the dependent variableYwas defined to have two possible outcomes: (1)
the child uses drugs and (2) the child does not use drugs. Again, these two
outcomes may be coded 1 and 0, respectively.
The examples above, and others, show a wide range of applications in which
the dependent variable is dichotomous and hence may be represented by a
variable taking the value 1 with probabilitypand the value 0 with probability
1 p. Such a variable is apoint binomial variable, that is, a binomial variable
withn¼1 trial, and the model often used to express the probabilitypas a
function of potential independent variables under investigation is the logistic
regression model. It should be noted that the regression models of Chapter 8 do
not apply here because linear combinations of independent variables are not
bounded between 0 and 1 as required in the applications above. Instead of
regression models imposed to describe the mean of a normal variate, the logis-
tic model has been used extensively and successfully in the health sciences to
describe the probability (or risk) of developing a condition—say, a disease—
over a specified time period as a function of certain risk factorsX 1 ;X 2 ;...;Xk.
The following is such a typical example.


Example 9.1 When a patient is diagnosed as having cancer of the prostate, an
important question in deciding on treatment strategy for the patient is whether
or not the cancer has spread to neighboring lymph nodes. The question is so
critical in prognosis and treatment that it is customary to operate on the patient
(i.e., perform a laparotomy) for the sole purpose of examining the nodes and
removing tissue samples to examine under the microscope for evidence of can-
cer. However, certain variables that can be measured without surgery are pre-
dictive of the nodal involvement; and the purpose of the study presented in
Brown (1980) was to examine the data for 53 prostate cancer patients receiving
surgery, to determine which of five preoperative variables are predictive of
nodal involvement. In particular, the principal investigator was interested in
the predictive value of the level of acid phosphatase in blood serum. Table 9.1
presents the complete data set. For each of the 53 patients, there are two
continuous independent variables: age at diagnosis and level of serum acid
phosphatase (100; called ‘‘acid’’), and three binary variables: x-ray reading,
pathology reading (grade) of a biopsy of the tumor obtained by needle before
surgery, and a rough measure of the size and location of the tumor (stage)
obtained by palpation with the fingers via the rectum. For these three binary
independent variables a value of 1 signifies a positive or more serious state and
a 0 denotes a negative or less serious finding. In addition, the sixth column
presents the finding at surgery—the primary binary response or dependent
variableY, a value of 1 denoting nodal involvement, and a value of 0 denoting
no nodal involvement found at surgery.
A careful reading of the data reveals, for example, that a positive x-ray or
an elevated acid phosphatase level, in general, seems likely being associated
with nodal involvement found at surgery. However, predictive values of other
variables are not clear, and to answer the question, for example, concerning


LOGISTIC REGRESSION 315
Free download pdf