Statistical Methods for Psychology

In addition, the percentage of students taking the SAT varies drastically from state to state, with 81% of the students in Connecticut and only 4% of the students in Utah. The states with the lowest percentages tend to be in the Midwest, with the highest in the Northeast. In states where a small percentage of the students are taking the exam, those are most likely to be the best students who have their eyes on being admitted to the best schools. These are students who are likely to do well. In Massachusetts and Connecticut, where most of the students take the SAT—the less able as well as the more able—the poorer students are going to lower the state average relative to states whose best students are mainly the ones being tested. If this were true, we would expect to see a negative relationship between the percentage of students taking the exam and the state’s mean score. This is what we see when we look at the correlation between SAT and LogPctSAT and at the scatterplot in the lower right of Figure 15.1.

Looking at One Predictor While Controlling for Another

The question that now arises is what would happen if we used both variables (Expend and LogPctSAT) simultaneously as predictors of the SAT score. What this really means, though it may not be immediately obvious, is that we will look at the relationship between Expend and SAT controlling for LogPctSAT. (We will also look at the relationship between LogPctSAT and SAT controlling for Expend.) When I say that we are controlling for LogPctSAT I mean that we are looking at the relationship while holding LogPctSAT constant. Imagine that we had many thousands of states instead only 50. Imagine also that we could pull out a collection of states that had exactly the same percentage of students taking the SAT—e.g., 60%. Then we could look at only the students from those states and compute the correlation and regression coefficient for predicting SAT from Expend. Then we could draw another sample of states, perhaps those with 40% of their students taking the exam. Again we could correlate Expect and SAT for only those states and compute a regression coefficient. Notice that I have calculated 2 correlations and 2 regression coefficients here, each with PctSAT held constant at a specific value (40% or 60%). Because we are only imagining that we had thousands of states, we can go further and imagine that we repeated this process many times, with PctSAT held at a specific value each time. For each of those analyses we would obtain a regression coefficient for the relationship between Expend and SAT, and an average of those many regression coefficients will be very close to the overall regression coefficient that we will shortly examine. The same is true if we averaged the correlations. (Without introducing a more complex model we are assuming that whatever the relationship between SAT and Expend, it is the same for each level of PctSAT.) Because in our imaginary exercise each correlation is based on a sample with a fixed value of LogPctSAT, each correlation is independent of LogPctSAT. In other words, if every state included in one of our correlations had 35% of its students taking the SAT, then LogPctSAT doesn’t vary and it can’t have an effect on the relationship between Expend and SAT. That means that our correlation, and regression coefficient between those two variables have controlled for LogPctSAT. Obviously we don’t have thousands of states—we only have 50 and that is not likely to get much larger. However that does not stop us from mathematically estimating what we would obtain if we could carry out the imaginary exercise that I just explained. And that is exactly what multiple regression is all about.

The Multiple Regression Equation

There are ways to think about multiple regression other than fixing the level of one or more variables, but before I discuss those I will go ahead and run a multiple regression on these data. I used SPSS to do so, and the results are shown in Exhibit 15.1. I specifically

15.1 Multiple Linear Regression 521

Statistical Methods for Psychology

Get our desktop app

Company

Features

Documentation

Resources