Statistical Methods for Psychology

(Michael S) #1
first set. We then apply the regression coefficients obtained from that sample against the
data in the other sample to obtain predicted values of Yon a cross-validation sample ( ).
Our interest then focuses on the question of the relationship between Yand in the new
subsample. If the regression equations have any reasonable level of validity, then the cross-
validation correlation ( —the correlation between Yand predicted on the othersam-
ple’s regression equation) should be high. If they do not, our solution does not amount to
much. will in almost all cases be less than , because depends on a regression
equation tailored for that set of data. Essentially, we have an equation that does its best to
account for every bump and wiggle (including sampling error) in the data. We should not
be surprised when it does not do as well in accounting for different bumps and wiggles in a
different set of data. However, substantial differences between and are an indication
that our solution lacks appreciable validity.

Missing Observations


Missing data are often a problem in regression analyses, and a number of alternative
methods have been devised to deal with them. The most common approach is simply to
delete all cases not having complete data on the variables being investigated. This is called
listwise(or casewise) deletion,because when an observation is missing we delete the
whole case.
A second approach, which is available in SPSS but is deliberately not available in
many programs, is called pairwise deletion.Here we use whatever data are at hand. If
the 13th subject has data on both Xand Y, then that subject is included in the calculation
of. But if subject 13 does not have a score on Z, that subject is not included in the cal-
culation of. Once the complete intercorrelation matrix has been computed us-
ing pairwise deletion, the rest of the regression solution follows directly from that
matrix.
Both of these solutions have their problems. Listwise deletion may result in relatively
low sample sizes, and, if the data are not missing completely at random, in samples that are
not a fair reflection of the population from which they were presumably sampled. Pairwise
deletion, on the other hand, can result in an intercorrelation matrix that does not resemble
the matrix that we would have if we had complete data on all cases. In fact, pairwise dele-
tion can result in an “impossible” intercorrelation matrix. It is well known that given
and , the correlation between Yand Z mustfall within certain limits. But if we keep
changing the data that go into the correlations, we could obtain an that is inconsistent
with the other two correlations. When we then try to use such an inconsistent matrix, we
find ourselves in serious trouble.
In recent years considerable attention has focused on imputingadditional values to
take the place of missing values. There are a large number of ways that this can be done,
but perhaps the easiest to see, but certainly not the best, is regression imputation. In re-
gression imputation you run a regression, using the observations you have, to predict one
variable from values of the other variables, perhaps using listwise deletion. When you
have created your regression equation you then plug in the subject’s scores on existing
variables and predict that person’s score on the missing variable. In this way you can sys-
tematically replace all of the missing data. You can then run your analysis on the com-
plete data set. I want to stress that I do not recommend this particular approach, but
I present it because it gives you a sense of the approaches that I do recommend. The im-
portant point is to see that the data that we have are used to make intelligent estimates of
the observations that we don’t have. A much more complete treatment of missing data
is available in Howell (2008) and at http://www.uvm.edu/~dhowell/StatPages/
More_Stuff/Missing_Data/Missing.html.

rYZ

rXZ

rXY

rXZ or rYZ

rXY

R^2 R^2 cv

R^2 cv R^2 R^2

Rcv YNcv

YNcv

YNcv

550 Chapter 15 Multiple Regression


listwise deletion


casewise deletion


pairwise deletion


imputing

Free download pdf