This model is multiplicative rather than additive and doesn’t look much like the analysis of
variance model.^4 But if you recall your high school algebra, you will remember that prod-
ucts become sums when you take logs. Thus, we can convert the preceding expression to
and we have something that very closely resembles the analysis of variance model.
We can then confuse everyone a little more by substituting the symbol l(lambda)to rep-
resent the natural log oft and have
which is an additive linear expression directly analogous to the model we had for the analy-
sis of variance. This model is linear in the logs, hence the name log-linear models.
To summarize, in the analysis of variance we modeled expected cell means as the sum
of the grand mean and row and column treatment effects. In log-linear models we model
the log of expected cell frequencies as the sum of the logsof the overall geometric mean
and the row and column effects. The arithmetic is slightly different and we are modeling
different things, but the logic is the same.
Given my new notation, I can now go back and characterize the separate models by
their underlying equations. The models are numbered in the order of their presentation, and
were shown in column two of Table 17.4.
- Equiprobability model:
- Conditional equiprobability model 1:
- Conditional equiprobability model 2:
- Mutual independence model:
- Saturated model:
The interaction term in the saturated modelis defined as what is left unex-
plained when we fit model 4 above. Thus,
This is a model in which every expected frequency is forced to be exactly equal to every
obtained frequency, and will be exactly 0.00. A saturated model alwaysfits the data perfectly.
Whereas in the analysis of variance we usually set up the complete model and testfor
interaction, the highest-order interaction in the log-linear analysis is not tested directly. The
interaction model in the R 3 Ccase is basically the model that we adopt if the simpler mu-
tual independence model (also called the additive model) does not fit.
17.3 Testing Models
The central issue in log-linear analysis is the issue of choosing an optimal model to fit the
data. In a normal chi-square test on a two-dimensional contingency table we just jump in,
posit what I have called the additive model, and, if we reject it, conclude that an interaction
term is necessary because the variables are not independent. I have done some of that here,
but I have shown you five possible models instead of one or two. That would certainly be
unnecessary if we were just interested in the two-variable case where we have only one
x^2
lVFij =lnAfijB2l2lVi 2lFj =lnAfijB 2 lnAFijB
AlVFijB
ln(Fij)=l1lVi1lFj1lVFij
ln(Fij)=l1lVi 1lFj
ln(Fij)=l1lFj
ln(Fij)=l1lVi
ln(Fij)=l
ln(Fij)=l1lVi 1lFj
lnAFijB=lnANtB 1 lnAtNiVB 1 lnANtjFB
638 Chapter 17 Log-Linear Analysis
(^4) In the analysis of variance we have variation within cells and can thus calculate an error term. In log-linear
models we are working with cell frequencies and will not have an error term. Therefore our models will have
nothing comparable to eijk.
l(lambda)
saturated model
additive model