Encyclopedia of Sociology

(Marcin) #1
ANALYSIS OF VARIANCE AND COVARIANCE

first ‘‘dummy coded.’’ The results will be consis-
tent with those obtained from an analysis of vari-
ance and covariance, but can also be interpreted
within a regression framework.


Dummy coding is a procedure where a sepa-
rate dichotomous variable is created for each cate-
gory of the nominal level variable. For example, in
a study of the effects of racial experience, the
variable for race can have several values that imply
no order or degree. Some of the categories might
be white, black, Latino, Indian, Asian, and other.
Since these categories imply no order or degree,
the variable for race cannot be used in a linear
regression analysis.


The alternative is to create five dummy vari-
ables—one for each race category except one.
Each of these variables measures whether or not
the respondent is the particular race or not. For
example, the first variable might be for the catego-
ry ‘‘white,’’ where a value of 0 is assigned if the
person is any other race but white, and a value of 1
is assigned if the person is white. Similarly, sepa-
rate variables would be created to identify mem-
bership in the black, Latino, Indian, and Asian
groups. A dummy variable is not created for the
‘‘other’’ category because its values are completely
determined by values on the other dummy vari-
ables (e.g., all persons with the racial category
‘‘other’’ will have a score of 0 on all of the dummy
variables). This determination is illustrated in the
table below.


Since there are only two categories or values
for each variable, the variables can be said to have
interval-level characteristics and can be entered
into a single regression equation such as the
following:


Y=a + b 1 D1 + b 2 D2 + b 3 D3 + b 4 D4 + b 5 D5 (^9 )

where Y is the score on the dependent variable, a is
the constant or Y-intercept, b 1 , b 2 , b 3 , b 4 , and b 5 are
the regression coefficients representing the effects
of each category of race on the dependent vari-
able, and D1, D2, D3, D4, and D5 are dummy
variables representing separate categories of race.
If a person is black, then his or her predicted Y
score would be equal to a + b 2 , since D2 would have
a value of 1 (b 2 D2 = b 2 1 = b 2 ) and D1, D3, D4, and
D5 would all be 0 (e.g., b 1 D1 = b 1
0 = 0). The effect
of race would then be the addition of each of the


Values on Dummy Variables
Respondent's Race D1 D2 D3 D4 D5
White 1 0 0 0 0
Black 0 1 0 0 0
Latino 0 0 1 0 0
Indian 0 0 0 1 0
Asian 0 0 0 0 1
Other 0 0 0 0 0

dummy variable effects. Other dummy or interval-
level variables could then be included in the analy-
sis and their effects could be interpreted as ‘‘con-
trolling for’’ the effects of race.
Estimating analysis of variance models. The
use of dummy variables in regression makes it
possible to estimate analysis of variance models as
well. In the example above, the value of a is equal
to the mean score on the dependent variable for
those who had a score of ‘‘other’’ on the race
variable (i.e., the omitted category). The mean
scores for the other racial groups can then be
calculated by adding the appropriate b value (re-
gression coefficient) to the a value. For example,
the mean for the Latino group would be a + b 3. In
addition, the squared multiple correlation coeffi-
cient (R^2 ) is equivalent to the measure of associa-
tion used in analysis of variance (ETA^2 ), and the F
test for statistical significance is also equivalent to
the one computed using conventional analysis of
variance procedures. This general model can be
further extended by adding additional terms into
the prediction equation for other control variables
measured on continuous scales. In effect, such an
analysis is equivalent to an analysis of covariance.

APPLICABILITY

In sociological studies, the researcher is rarely able
to manipulate the stimulus (or independent vari-
able) and tends to be more interested in behavior
in natural settings rather than controlled experi-
mental settings. As a result, randomization of
preexisting differences through random assign-
ment of subjects to experimental and control groups
is not possible and physical control over more
immediate outside influences on behavior cannot
be attained. In sociological studies, ‘‘other things’’
are rarely equal and must be ruled out as possible
alternative explanations for group differences
through ‘‘statistical control.’’ This statistical con-
trol is best accomplished through correlational
Free download pdf