Encyclopedia of Sociology

CORRELATION AND REGRESSION ANALYSIS

variables, X and Y, will move the partial, rxy.z,
closer to 0, and in some circumstances a positive
bivariate correlation may become negative after
controlling for a third variable. When rxy is posi-
tive and the algebraic sign of ryz differs from the
sign of rxz (so that their product is negative), the
partial will be larger than the bivariate correlation,
indicating that Z is a suppressor variable—that is, a
variable that diminishes the correlation between X
and Y unless it is controlled. Further discussion of
partial correlation and its interpretation will be
found in Simon 1954; Mueller, Schuessler, and
Costner 1977; and Blalock 1979.

Any correlation between two sets of regres-
sion residuals is called a partial correlation coeffi-
cient. The illustration immediately above is called a
first-order partial, meaning that one and only one
variable has been held constant. A second-order
partial means that two variables have been held
constant. More generally, an nth-order partial is one
in which precisely n variables have been ‘‘con-
trolled’’ or held constant by statistical adjustment.

When only one of the variables being correlat-
ed is a regression residual (e.g., X is correlated
with the residuals of Y on Z), the correlation is
called a part correlation. Although part correlations
are rarely used, they are appropriate when it seems
implausible to residualize one variable. Generally,
part correlations are smaller in absolute value than
the corresponding partial correlation.

MULTIPLE REGRESSION

Earned income level is influenced not simply by
one’s education but also by work experience, skills
developed outside of school and work, the prevail-
ing compensation for the occupation or profes-
sion in which one works, the nature of the regional
economy where one is employed, and numerous
other factors. Hence it should not be surprising
that education alone does not predict income with
high accuracy. The deviations between actual in-
come and income predicted on the basis of educa-
tion are presumably due to the influence of all the
other factors that have an effect, great or small, on
one’s income level. By including some of these
other variables as additional predictors, the accu-
racy of prediction should be increased. Otherwise
stated, one expects to predict Y better using both

X 1 and X 2 (assuming both influence Y) than with either of these alone.

A regression equation including more than a single predictor of Y is called a multiple regression equation. For two predictors, the multiple regression equation is:

Y = ay.12 + by1.2X 1 + by2.1X 2 (^12 )

where Ŷ = the least squares prediction of Y based on X 1 and X 2 ; ay.12 = the Y intercept (i.e., the predicted value of Y when both X 1 and X 2 are 0); by1.2 = the (unstandardized) regression slope of Y on X 1 , holding X 2 constant; and by2.1 = the (unstandardized) regression slope of Y on X 2 , holding X 1 constant. In multiple regression analysis, the predicted variable (Y in equation 12) is commonly known as the criterion variable, and the X’s are called predictors. As in a bivariate regression equation (equation 2), one assumes both rectilinearity and homoscedasticity, and one finds the Y intercept (ay.12 in equation 12) and the regression slopes (one for each predictor; they are by1.2 and by2.1 in equation 12) that best fit by the criterion of least squares. The b’s or regression slopes are partial regression coefficients. The correlation between the resulting regression predictions (Ŷ) and the observed values of Y is called the multiple correlation coefficient, symbolized by R.

In contemporary applications of multiple regression, the partial regression coefficients are typically the primary focus of attention. These coefficients describe the regression of the criterion variable on each predictor, holding constant all other predictors in the equation. The b’s in equation 12 are unstandardized coefficients. The analo- gous multiple regression equation for all variables expressed in standardized form is

Z y= b*y1.2 Z 1 + b*y2.1Z 2 (1 3 )

where Ẑ = the regression prediction for the ‘‘z measure’’ of Y, given X 1 and X 2 Z 1 = the standard deviate of X 1 Z 2 = the standard deviate of X 2 b*y1.2 = the standardized slope of Y on X 1 , holding X 2 constant; and b*y2.1 = the standardized slope of Y on X 2 , holding X 1 constant.

The standardized regression coefficients in an equation with two predictors may be calculated from the bivariate correlations as follows:

Encyclopedia of Sociology

Get our desktop app

Company

Features

Documentation

Resources