ESTROGEN – A dichotomous variable indicating whether the subject ever
(coded 1) or never (coded 0) used estrogen.
SUBTYPE – A three-category polytomous outcome indicating whether the
subject’s histological subtype is Adenocarcinoma (coded 0),
Adenosquamous (coded 1), or Other (coded 2).
AGE – A dichotomous variable indicating whether the subject is within the age
group 50–64 (coded 0) or within the age group 65–79 (coded 1). All 286
subjects are within one of these age groups.
SMK – A dichotomous variable indicating whether the subject is (coded 1) or is
not (coded 0) a current smoker.- Infant dataset (infant.dat)
This is the dataset that is used to demonstrate GEE modeling. The infant dataset,
discussed in Chaps. 14 and 15, is part of a health intervention study in Brazil (Cannon
et al., 2001). The study involves 168 infants, each of whom has at least five and up to
nine monthly measurements, yielding 1,458 observations in all. There are complete
data on all covariates for 136 of the infants. The outcome of interest is derived from a
weight-for-height standardized score based on the weight-for-height distribution of a
standard population. The outcome is correlated since there are multiple
measurements for each infant. The variables are defined as follows:
IDNO – The subject (infant) identifier. Each subject has up to nine observations.
This is the variable that defines the cluster used for the correlated analysis.
MONTH – A variable taking the values 1 through 9 that indicates the order of an
infant’s monthly measurements. This is the variable that distinguishes
observations within a cluster.
OUTCOME – Dichotomous outcome of interest derived from a weight-for-
height standardizedz-score. The outcome is coded 1 if the infant’sz-score
for a particular monthly measurement is less than negative one and
coded 0 otherwise.
BIRTHWGT – A continuous variable that indicates the infant’s birth weight in
grams. This is a time-independent variable, as the infant’s birth weight does
not change over time. The value of the variable is missing for 32 infants.
GENDER – A dichotomous variable indicating whether the infant is male
(coded 1) or female (coded 2).
DIARRHEA – A dichotomous time-dependent variable indicating whether the
infant did (coded 1) or did not (coded 0) have symptoms of diarrhea that
month.
- Knee Fracture dataset (kneefr.dat)
This dataset is used to demonstrate how to generate classification tables and receiver
operating characteristic (ROC) curves using logistic regression. The knee fracture
dataset discussed in Chap. 10 contains information on 348 patients of which 45
actually had a knee fracture (Tigges et al., 1999). The goal of the study is to evaluate
whether a patient’s pattern of covariates can be used as a screening test before
performing the X-ray. Since 1.3 million people visit North American emergency
departments annually complaining of blunt knee trauma, the total cost associated
with even a relatively inexpensive test such as a knee radiograph may be substantial.
Datasets 601