Encyclopedia of Sociology

(Marcin) #1
CORRELATION AND REGRESSION ANALYSIS

part of the correlation that is attributable to that
common cause constitutes the ‘‘spurious compo-
nent’’ of the correlation. If the variables responsi-
ble for the ‘‘spurious component’’ of a correlation
have been included in the path analysis, the ‘‘spuri-
ous component’’ can be estimated; otherwise such
a ‘‘spurious component’’ is merged into the ‘‘di-
rect effect,’’ which, despite the connotations of the
name, absorbs all omitted indirect effects and all
omitted spurious components.


‘‘Stepwise’’ Regression Analysis. The inter-
pretation of regression results can sometimes be
facilitated without specifying completely the pre-
sumed causal structure among a set of predictors.
If the purpose of the analysis is to enhance under-
standing of the variation in a single dependent
variable, and if the various predictors presumed to
contribute to that variation can be grouped, for
example, into proximate causes and distant causes, a
stepwise regression analysis may be useful. Depend-
ing on one’s primary interest, one may proceed in
two different ways. For example, one may begin by
regressing the criterion variable on the distant
causes, and then, in a second step, introduce the
proximate causes into the regression equation.
Comparison of the coefficients at each step will
reveal the degree to which the effects of the distant
causes are mediated by the proximate causes in-
cluded in the analysis. Alternatively, one may be-
gin by regressing the criterion variable on the
proximate causes, and then introduce the distant
causes into the regression equation in a second
step. Comparing the coefficients at each step, one
can infer the degree to which the first-step regres-
sion of the criterion variable on the proximate
causes is spurious because of the dependence of
both on the distant causes. A stepwise regression
analysis may proceed with more than two stages if
one wishes to distinguish more than two sets of
predictors. One may think of a stepwise regression
analysis of this kind as analogous to a path analysis
but without a complete specification of the causal
structure.


Nonadditive Effects in Multiple Regression.
In the illustrative regression equations preceding
this section, each predictor has appeared only
once, and never in a ‘‘multiplicative’’ term. We
now consider the following regression equation,
which includes such a multiplicative term:


Y = ay.12 + by1.2X 1 + by2.1X 2 + by.12X 1 X 2 (^20 )

In this equation, Y is said to be predicted, not
simply by an additive combination of X 1 and X 2
but also by their product, X 1 X 2. Although it may
not be intuitively evident from the equation itself,
the presence of a multiplicative effect (i.e., the
regression coefficient for the multiplicative term,
by.12, is not 0) implies that the effect of X 1 on Y
depends on the level of X 2 , and vice versa. This is
commonly called an interaction effect (Allison 1977;
Blalock 1979; Jaccard, Turrisi and Wan 1990; Ai-
ken and West 1991; McClendon 1994). The inclu-
sion of multiplicative terms in a regression equa-
tion is especially appropriate when there are sound
reasons for assuming that the effect of one vari-
able differs for different levels of another variable.
For example, if one assumes that the ‘‘return to
education’’ (i.e., the annual income added by each
additional year of schooling) will be greater for
men than for women, this assumption can be
explored by including all three predictors: educa-
tion, gender, and the product of gender and
education.
When product terms have been included in a
regression equation, the interpretation of the re-
sulting partial regression coefficients may become
complex. For example, unless all predictors are
‘‘ratio variables’’ (i.e., variables measured in uni-
form units from an absolute 0), the inclusion of a
product term in a regression equation renders the
coefficients for the additive terms uninterpretable
(see Allison 1977).

SAMPLING VARIATION AND TESTS
AGAINST THE NULL HYPOTHESIS

Descriptions based on incomplete information
will be inaccurate because of ‘‘sampling varia-
tion.’’ Otherwise stated, different samplings of
information will yield different results. This is true
of sample regression and correlation coefficients,
as it is for other descriptors. Assuming a random
selection of observed information, the ‘‘shape’’ of
the distribution of such sampling variation is often
known by mathematical reasoning, and the magni-
tude of such variation can be estimated. For exam-
ple, if the true correlation between X and Y is 0, a
series of randomly selected observations will rare-
ly yield a correlation that is precisely 0. Instead, the
observed correlation will fluctuate around 0 in the
Free download pdf