Logistic model for matched data
includes control of variables not
matched
Stratified analysis inefficient:
Data is discarded
Matched data:
Use conditional ML estimation
(number of parameters large
relative ton)
Pair-matching:
ORdU¼ðORdCÞ^2
"
overestimate
Principle
Matched analysis)stratified
analysis
Strata are matched sets, e.g.,
pairs
Strata defined using dummy
(indicator) variables
E¼(0, 1) exposure
C 1 ,C 2 ,...,Cpcontrol variables
For example, one may match on AGE, RACE,
and SEX, but may also wish to control for
systolic blood pressure and body size, which
may have also been measured but were not
part of the matching.
In the remainder of the presentation, we describe
how to formulate and apply a logistic model to
analyze matched data, which allows for the con-
trol of variables not involved in the matching.
In this situation, using a stratified analysis
approach instead of logistic regression will
usually be inefficient in that much of one’s
data will need to be discarded, which is not
required using a modeling approach.
The model that we describe below for matched
data requires the use of conditional ML estima-
tion for estimating parameters. This is because,
as we shall see, when there are matched data,
the number of parameters in the model is large
relative to the number of observations.
If unconditional ML estimation is used instead
of conditional, an overestimate will be obtained.
In particular, for pair-matching, the estimated
odds ratio using the unconditional approach
will be the square of the estimated odds ratio
obtained from the conditional approach, the
latter being the correct result.
An important principle about modeling
matched data is that such modeling requires
the matched data to be considered in strata. As
described earlier, the strata are the matched
sets, for example, the pairs in a matched pair
design. In particular, the strata are defined
usingdummyor indicator variables, which we
will illustrate shortly.
In defining a model for a matched analysis, we
consider the special case of a single (0, 1) expo-
sure variable of primary interest, together with
a collection of control variablesC 1 ,C 2 , and so on
up throughCp, to be adjusted in the analysis for
possible confounding and interaction effects.
EXAMPLE
Match on AGE, RACE, SEX
also, control for SBP and BODYSIZE
398 11. Analysis of Matched Data Using Logistic Regression