Statistical Methods for Psychology

first set. We then apply the regression coefficients obtained from that sample against the data in the other sample to obtain predicted values of Yon a cross-validation sample ( ). Our interest then focuses on the question of the relationship between Yand in the new subsample. If the regression equations have any reasonable level of validity, then the cross- validation correlation ( —the correlation between Yand predicted on the othersam- ple’s regression equation) should be high. If they do not, our solution does not amount to much. will in almost all cases be less than , because depends on a regression equation tailored for that set of data. Essentially, we have an equation that does its best to account for every bump and wiggle (including sampling error) in the data. We should not be surprised when it does not do as well in accounting for different bumps and wiggles in a different set of data. However, substantial differences between and are an indication that our solution lacks appreciable validity.

Missing Observations

Missing data are often a problem in regression analyses, and a number of alternative methods have been devised to deal with them. The most common approach is simply to delete all cases not having complete data on the variables being investigated. This is called listwise(or casewise) deletion,because when an observation is missing we delete the whole case. A second approach, which is available in SPSS but is deliberately not available in many programs, is called pairwise deletion.Here we use whatever data are at hand. If the 13th subject has data on both Xand Y, then that subject is included in the calculation of. But if subject 13 does not have a score on Z, that subject is not included in the calculation of. Once the complete intercorrelation matrix has been computed using pairwise deletion, the rest of the regression solution follows directly from that matrix. Both of these solutions have their problems. Listwise deletion may result in relatively low sample sizes, and, if the data are not missing completely at random, in samples that are not a fair reflection of the population from which they were presumably sampled. Pairwise deletion, on the other hand, can result in an intercorrelation matrix that does not resemble the matrix that we would have if we had complete data on all cases. In fact, pairwise deletion can result in an “impossible” intercorrelation matrix. It is well known that given and , the correlation between Yand Z mustfall within certain limits. But if we keep changing the data that go into the correlations, we could obtain an that is inconsistent with the other two correlations. When we then try to use such an inconsistent matrix, we find ourselves in serious trouble. In recent years considerable attention has focused on imputingadditional values to take the place of missing values. There are a large number of ways that this can be done, but perhaps the easiest to see, but certainly not the best, is regression imputation. In regression imputation you run a regression, using the observations you have, to predict one variable from values of the other variables, perhaps using listwise deletion. When you have created your regression equation you then plug in the subject’s scores on existing variables and predict that person’s score on the missing variable. In this way you can sys- tematically replace all of the missing data. You can then run your analysis on the complete data set. I want to stress that I do not recommend this particular approach, but I present it because it gives you a sense of the approaches that I do recommend. The im- portant point is to see that the data that we have are used to make intelligent estimates of the observations that we don’t have. A much more complete treatment of missing data is available in Howell (2008) and at http://www.uvm.edu/~dhowell/StatPages/ More_Stuff/Missing_Data/Missing.html.

rYZ

rXZ

rXY

rXZ or rYZ

rXY

R^2 R^2 cv

R^2 cv R^2 R^2

Rcv YNcv

YNcv

550 Chapter 15 Multiple Regression

listwise deletion

casewise deletion

pairwise deletion

imputing

Statistical Methods for Psychology

Get our desktop app

Company

Features

Documentation

Resources