Figure 15.1 Histograms, Q-Q plots, and scatter plots of the variables used in this example
15.1 Multiple Linear Regression 519
3
Frequency^0
5
10
10
15
4567
Expend
Expend
8 9 10 −2
4
6
8
10
−1 0 1 2
Theoretical quantiles
Expend
Sample quantiles
12
Frequency^0
5
15
15
14 16 18 20
P/T ratio
P/T Ratio P/T Ratio P/T Ratio
22 24 26
800 900 1000
SAT
SAT combined SAT combined
1100
Frequency 0
4
8
12
Frequency 0
5
10
02040
Pct SAT
PctSAT PctSAT PctSAT
Log(PctSAT)
60 80
4
850
1000
5678910
Expend
Expend
SAT
−2
14
18
22
−1 0 1 2
Theoretical quantiles
Sample quantiles
−2
20
60
−1 0 1 2
Theoretical quantiles
Sample quantiles
−2
850
1000
−1 0 1 2
Theoretical quantiles
Sample quantiles
14
850
1000
16 18 20 22 24
P/T ratio
SAT
850
1000
20 40 60 80
PctSAT
SAT
1.5 2.0 2.5 3.0 3.54.0 4.5
850
1000
LogpctSAT
SAT
the ACT unless they are applying to prestigious schools on either coast, such as Harvard,
Princeton, Berkeley, or Stanford. This is certainly an overly sweeping generalization, but
it will become important shortly.
Before we consider the regression solution itself, we need to look at the distribution of
each variable. These are shown for several variables as histograms, Q-Q plots, and scatter-
plots in Figure 15.1. It is clear from these plots that our variables are not normally distrib-
uted. From these displays it is apparent that the criterion variable and three of the predictors
are fairly well distributed. The distribution of the percentage of students taking the SAT is
definitely bimodal, reflecting the fact that each test is either taken by most students in that
state or by few. In addition the relationship between PctSAT and SAT score is curvilinear,
in part reflecting that bimodality. The distribution becomes slightly better when we take a
loge transformation of PctSAT, and its relationship with SAT is more linear. The scatterplot
against the SAT is shown in the lower right. We will make use of this transformed variable
instead of PctSAT itself because it makes an important point, though its distribution is still
decidedly bimodal. The combined SAT score shows a wide distribution.