Figure 15.1 Histograms, Q-Q plots, and scatter plots of the variables used in this example
15.1 Multiple Linear Regression 5193Frequency^051010154567
ExpendExpend8 9 10 −246810−1 0 1 2
Theoretical quantilesExpendSample quantiles12Frequency^05151514 16 18 20
P/T ratioP/T Ratio P/T Ratio P/T Ratio22 24 26800 900 1000
SATSAT combined SAT combined1100Frequency 04812Frequency 051002040
Pct SATPctSAT PctSAT PctSATLog(PctSAT)60 80485010005678910
ExpendExpendSAT−2141822−1 0 1 2
Theoretical quantilesSample quantiles−22060−1 0 1 2
Theoretical quantilesSample quantiles−28501000−1 0 1 2
Theoretical quantilesSample quantiles14850100016 18 20 22 24
P/T ratioSAT850100020 40 60 80
PctSATSAT1.5 2.0 2.5 3.0 3.54.0 4.58501000LogpctSATSATthe ACT unless they are applying to prestigious schools on either coast, such as Harvard,
Princeton, Berkeley, or Stanford. This is certainly an overly sweeping generalization, but
it will become important shortly.
Before we consider the regression solution itself, we need to look at the distribution of
each variable. These are shown for several variables as histograms, Q-Q plots, and scatter-
plots in Figure 15.1. It is clear from these plots that our variables are not normally distrib-
uted. From these displays it is apparent that the criterion variable and three of the predictors
are fairly well distributed. The distribution of the percentage of students taking the SAT is
definitely bimodal, reflecting the fact that each test is either taken by most students in that
state or by few. In addition the relationship between PctSAT and SAT score is curvilinear,
in part reflecting that bimodality. The distribution becomes slightly better when we take a
loge transformation of PctSAT, and its relationship with SAT is more linear. The scatterplot
against the SAT is shown in the lower right. We will make use of this transformed variable
instead of PctSAT itself because it makes an important point, though its distribution is still
decidedly bimodal. The combined SAT score shows a wide distribution.