Encyclopedia of Sociology

(Marcin) #1
CORRELATION AND REGRESSION ANALYSIS

variables, X and Y, will move the partial, rxy.z,
closer to 0, and in some circumstances a positive
bivariate correlation may become negative after
controlling for a third variable. When rxy is posi-
tive and the algebraic sign of ryz differs from the
sign of rxz (so that their product is negative), the
partial will be larger than the bivariate correlation,
indicating that Z is a suppressor variable—that is, a
variable that diminishes the correlation between X
and Y unless it is controlled. Further discussion of
partial correlation and its interpretation will be
found in Simon 1954; Mueller, Schuessler, and
Costner 1977; and Blalock 1979.


Any correlation between two sets of regres-
sion residuals is called a partial correlation coeffi-
cient. The illustration immediately above is called a
first-order partial, meaning that one and only one
variable has been held constant. A second-order
partial means that two variables have been held
constant. More generally, an nth-order partial is one
in which precisely n variables have been ‘‘con-
trolled’’ or held constant by statistical adjustment.


When only one of the variables being correlat-
ed is a regression residual (e.g., X is correlated
with the residuals of Y on Z), the correlation is
called a part correlation. Although part correlations
are rarely used, they are appropriate when it seems
implausible to residualize one variable. Generally,
part correlations are smaller in absolute value than
the corresponding partial correlation.


MULTIPLE REGRESSION

Earned income level is influenced not simply by
one’s education but also by work experience, skills
developed outside of school and work, the prevail-
ing compensation for the occupation or profes-
sion in which one works, the nature of the regional
economy where one is employed, and numerous
other factors. Hence it should not be surprising
that education alone does not predict income with
high accuracy. The deviations between actual in-
come and income predicted on the basis of educa-
tion are presumably due to the influence of all the
other factors that have an effect, great or small, on
one’s income level. By including some of these
other variables as additional predictors, the accu-
racy of prediction should be increased. Otherwise
stated, one expects to predict Y better using both


X 1 and X 2 (assuming both influence Y) than with
either of these alone.

A regression equation including more than a
single predictor of Y is called a multiple regression
equation. For two predictors, the multiple regres-
sion equation is:

Y = ay.12 + by1.2X 1 + by2.1X 2 (^12 )

where Ŷ = the least squares prediction of Y based
on X 1 and X 2 ; ay.12 = the Y intercept (i.e., the
predicted value of Y when both X 1 and X 2 are 0);
by1.2 = the (unstandardized) regression slope of Y
on X 1 , holding X 2 constant; and by2.1 = the
(unstandardized) regression slope of Y on X 2 ,
holding X 1 constant. In multiple regression analy-
sis, the predicted variable (Y in equation 12) is
commonly known as the criterion variable, and the
X’s are called predictors. As in a bivariate regres-
sion equation (equation 2), one assumes both
rectilinearity and homoscedasticity, and one finds
the Y intercept (ay.12 in equation 12) and the
regression slopes (one for each predictor; they are
by1.2 and by2.1 in equation 12) that best fit by the
criterion of least squares. The b’s or regression
slopes are partial regression coefficients. The correla-
tion between the resulting regression predictions
(Ŷ) and the observed values of Y is called the
multiple correlation coefficient, symbolized by R.

In contemporary applications of multiple re-
gression, the partial regression coefficients are
typically the primary focus of attention. These
coefficients describe the regression of the criteri-
on variable on each predictor, holding constant all
other predictors in the equation. The b’s in equa-
tion 12 are unstandardized coefficients. The analo-
gous multiple regression equation for all variables
expressed in standardized form is

Z y= b*y1.2 Z 1 + b*y2.1Z 2 (1 3 )

where Ẑ = the regression prediction for the ‘‘z
measure’’ of Y, given X 1 and X 2 Z 1 = the standard
deviate of X 1 Z 2 = the standard deviate of X 2 b*y1.2
= the standardized slope of Y on X 1 , holding X 2
constant; and b*y2.1 = the standardized slope of Y
on X 2 , holding X 1 constant.

The standardized regression coefficients in an
equation with two predictors may be calculated
from the bivariate correlations as follows:
Free download pdf