Encyclopedia of Sociology

(Marcin) #1
CORRELATION AND REGRESSION ANALYSIS

‘‘shape’’ of a normal distribution, and the stand-
ard deviation of that normal sampling distribu-
tion—called the standard error of r—will be


1
σr =
N – 1

( 21 )

where σr = the standard error of r (i.e. the standard
deviation of the sampling distribution of r, given
that the true correlation is 0); and N = the sample
size (i.e., the number of randomly selected cases
used in the calculation of r). For example, if the
true correlation were 0, a correlation coefficient
based on a random selection of 400 cases will have
a standard error of approximately 1/20 or .05. An
observed correlation of .15 or greater in absolute
value would thus be at least three standard errors
away from 0 and hence very unlikely to have
appeared simply because of random fluctuations
around a true value of 0. This kind of conclusion is
commonly expressed by saying that the observed
correlation is ‘‘significantly different from 0’’ at a
given level of significance (in this instance, the
level of significance cited could appropriately be
.01). Or the same conclusion may be more simply
(but less precisely) stated by saying that the ob-
served correlation is ‘‘significant.’’


The standard error for an unstandardized
bivariate regression coefficient, and for an
unstandardized partial regression coefficient, may
also be estimated (Cohen and Cohen 1983; Kleinman,
Kupper and Muller 1988; Hamilton 1992; McClendon
1994; Fox 1997). Other things being equal, the
standard error for the regression of the criterion
on a given predictor will decrease as (1) the num-
ber of observations (N) increases; (2) the variance
of observed values around predicted values de-
creases; (3) the variance of the predictor increases;
and (4) the correlation between the predictor and
other predictors in the regression equation decreases.


PROBLEMS IN REGRESSION ANALYSIS

Multiple regression is a special case of a very
general and adaptable model of data analysis known
as the general linear model (Cohen 1968; Fennessey
1968; Blalock 1979). Although the assumptions
underlying multiple regression seem relatively de-
manding (see Berry 1993), the technique is re-
markably ‘‘robust,’’ which is to say that the tech-
nique yields valid conclusions even when the


assumptions are met only approximately (Bohrnstedt
and Carter 1971). Even so, restricted or biased
samples may lead to conclusions that are mislead-
ing if they are inappropriately generalized. Fur-
thermore, regression results may be misinterpret-
ed if interpretation rests on an implicit causal
model that is misspecified. For this reason it is
advisable to make the causal model explicit, as in
path analysis or structural equation modeling and
to use regression equations that are appropriate
for the model as specified. ‘‘Outliers’’ and ‘‘devi-
ant cases’’ (i.e., cases extremely divergent from
most) may have an excessive impact on regression
coefficients, and hence may lead to erroneous
conclusions. (See Berry and Feldman 1985; Fox
1991; Hamilton 1992.) A ubiquitous but still not
widely recognized source of misleading results in
regression analysis is measurement error (both
random and non-random) in the variables (Stouffer
1936; Kahneman 1965; Gordon 1968; Bohrnstedt
and Carter 1971; Fuller and Hidiroglou 1978;
Berry and Feldman 1985). In bivariate correlation
and regression, the effect of measurement error
can be readily anticipated: on the average, random
measurement error in the predicted variable attenu-
ates (moves toward zero) the correlation coeffi-
cient (i.e., the standardized regression coefficient)
but not the unstandardized regression coefficient,
while random measurement error in the predictor
variable will, on the average, attenuate both the
standardized and the unstandardized coefficients.
In multiple regression analysis, the effect of ran-
dom measurement error is more complex. The
unstandardized partial regression coefficient for a
given predictor will be biased, not simply by ran-
dom measurement error in that predictor, but also
by other features. When random measurement
error is entailed in a given predictor, X, that
predictor is not completely controlled in a regres-
sion analysis. Consequently, the unstandardized
partial regression coefficient for every other pre-
dictor that is correlated with X will be biased by
random measurement error in X. Measurement
error may be non-random as well as random, and
anticipating the effect of non-random measure-
ment error on regression results is even more
challenging than anticipating the effect of random
error. Non-random measurement errors may be
correlated errors (i.e., errors that are correlated
with other variables in the system being analyzed),
and therefore they have the potential to distort
greatly the estimates of both standardized and
Free download pdf