Influential observations:
Individual data may influence
regression coefficients, e.g.,
outlier
Coefficients may change if
outlier is dropped from analysis
IV. Variable Specification
Stage
Define clinically or biologically
meaningful independent
variables
Provide initial model
SpecifyD,E,C 1 ,C 2 ,...,Cpbased
on:
Study goals
Literature review
Theory
SpecifyVs based on:
Prior research or theory
Possible statistical problems
Influential observationsrefer to data on indivi-
duals that may have a large influence on the
estimated regression coefficients. For example,
an outlier in one or more of the independent
variables may greatly affect one’s results. If a
person with an outlier is dropped from the
data, the estimated regression coefficients
may greatly change from the coefficients
obtained when that person is retained in the
data. Methods for assessing the possibility of
influential observations should be considered
when determining a best model.
At the variable specification stage, clinically or
biologically meaningful independent variables
are defined in the model to provide the largest
model to be initially considered.
We begin by specifying theDandEvariables of
interest together with the set of risk factorsC 1
throughCpto be considered for control. These
variables are defined and measured by the
investigator based on the goals of one’s study
and a review of the literature and/or biological
theory relating to the study.
Next, we must specify theVs, which are func-
tions of theCsthatgointothemodelaspotential
confounders. Generally, we recommend that the
choice of Vsbebasedprimarilyonprior
research or theory, with some consideration of
possible statistical problems like multicollinear-
ity that might result from certain choices.
For example, if theCs are AGE, RACE, and
SEX, one choice for theVs is theCs themselves.
Another choice includes AGE, RACE, and SEX
plus more complicated functions such as AGE^2 ,
AGERACE, RACESEX, and AGESEX.
We would recommend any of the latter four
variables only if prior research or theory sup-
ported their inclusion in the model. Moreover,
even if biologically relevant, such variables
may be omitted from consideration to avoid a
possible collinearity problem.
EXAMPLE
Cs: AGE, RACE, SEX
Vs:
Choice 1: AGE, RACE, SEX
Choice 2: AGE, RACE, SEX, AGE^2 ,
AGERACE, RACESEX,
AGESEX
Presentation: IV. Variable Specification Stage 173