Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
CAT – A dichotomous predictor variable indicating high (coded 1) or normal
(coded 0) catecholamine level.
AGE – A continuous variable for age (in years).
CHL – A continuous variable for cholesterol.
SMK – A dichotomous predictor variable indicating whether the subject ever
smoked (coded 1) or never smoked (coded 0).
ECG – A dichotomous predictor variable indicating the presence (coded 1) or
absence (coded 0) of electrocardiogram abnormality.
DBP – A continuous variable for diastolic blood pressure.
SBP – A continuous variable for systolic blood pressure.
HPT – A dichotomous predictor variable indicating the presence (coded 1) or
absence (coded 0) of high blood pressure. HPT is coded 1 if the systolic
blood pressure is greater than or equal to 160 or the diastolic blood
pressure is greater than or equal to 95.
CH and CC – Product terms of CATHPT and CATCHL, respectively.


  1. MI dataset (mi.dat)


This dataset is used to demonstrate conditional logistic regression. The MI dataset is
discussed in Chap. 11. The study is a case-control study that involves 117 subjects in
39 matched strata. Each stratum contains three subjects, one of whom is a case
diagnosed with myocardial infarction while the other two are matched controls. The
variables are defined as follows:
MATCH – A variable indicating the subject’s matched stratum. Each stratum
contains one case and two controls and is matched on age, race, sex, and
hospital status.
PERSON – The subject identifier. Each observation has a unique identifier since
there is one observation per subject.
MI – A dichotomous outcome variable indicating the presence (coded 1) or
absence (coded 0) of myocardial infarction.
SMK – A dichotomous variable indicating whether the subject is (coded 1) or is
not (coded 0) a current smoker.
SBP – A continuous variable for systolic blood pressure.
ECG – A dichotomous predictor variable indicating the presence (coded 1) or
absence (coded 0) of electrocardiogram abnormality.



  1. Cancer dataset (cancer.dat)


This dataset is used to demonstrate polytomous and ordinal logistic regression. The
cancer dataset, discussed in Chaps. 12 and 13, is part of a study of cancer survival
(Hill et al., 1995). The study involves 288 women who had been diagnosed with
endometrial cancer. The variables are defined as follows:
ID – The subject identifier. Each observation has a unique identifier since there
is one observation per subject.
GRADE – A three-level ordinal outcome variable indicating tumor grade.
The grades are well differentiated (coded 0), moderately differentiated
(coded 1), and poorly differentiated (coded 2).
RACE – A dichotomous variable indicating whether the race of the subject is
black (coded 1) or white (coded 0).


600 Appendix: Computer Programs for Logistic Regression

Free download pdf