Epidemiologic framework
X 1 ,X 2 ,...,Xkmeasured atT 0
Time: T 0 T 1
X 1 , X 2 ,... , Xk D(0,1)
P(D¼1|X 1 ,X 2 ,...,Xk)
DEFINITION
Logistic model:
PðÞD¼ 1 jX 1 ;X 2 ;...;Xk
¼
1
1 þeðÞaþ~biXi
""
unknown parameters
NOTATION
P(D¼1|X 1 ,X 2 ,...,Xk)
¼P(X)
Model formula:
P
X
¼
1
1 þeðaþ~biXiÞ
The logistic model considers the following gen-
eralepidemiologic study framework: We have
observed independent variablesX 1 ,X 2 , and so
on up toXkon a group of subjects, for whom we
have also determined disease status, as either 1
if “with disease” or 0 if “without disease”.
We wish to use this information to describe the
probability that the disease will develop during
a defined study period, sayT 0 toT 1 , in a disease-
free individual with independent variable values
X 1 ,X 2 ,uptoXk, which are measured atT 0.
The probability being modeled can be denoted
by the conditional probability statement
P(D¼1|X 1 ,X 2 ,...,Xk).
The model is defined aslogisticif the expres-
sion for the probability of developing the dis-
ease, given theXs, is 1 over 1 plus e to minus
the quantityaplus the sum fromiequals 1 tok
ofbitimesXi.
The terms aand bi in this model represent
unknown parametersthat we need to estimate
based on data obtained on theXs and onD
(disease outcome) for a group of subjects.
Thus, if we knew the parametersaand thebi
and we had determined the values of X 1
throughXkfor a particular disease-free individ-
ual, we could use this formula to plug in these
values and obtain the probability that this indi-
vidual would develop the disease over some
defined follow-up time interval.
For notational convenience, we will denote the
probability statement P(D¼1|X 1 ,X 2 ,...,Xk)as
simply P(X) where theboldXis a shortcut
notation for the collection of variables X 1
throughXk.
Thus, the logistic model may be written as P(X)
equals 1 over 1 plus e to minus the quantitya
plus the sumbiXi.
8 1. Introduction to Logistic Regression