342 Statistical Methods
only 10.5% of the variation in the calculus score. Another way of saying this
is that using HS Rank as a predictor improves by 10.5% the sum of squared
errors, as compared with using just the mean calculus score as a predictor.
Note that the p value for this correlation is 0.003 (cell H16), which is less
than 0.05, so the correlation is signifi cant at the 5% signifi cance level.
Just because the taking of high school calculus and the subsequent college
calculus score have a signifi cant correlation, you cannot conclude that taking
calculus in high school causes a better grade in college. The stronger math
students tend to take calculus in high school, and these students also do well
in college. Only if a fair assignment of students to classes could be guaran-
teed (so that the students in high school calculus would be no better or worse
than others) could the correlation be interpreted in terms of causation.
Correlation with a Two-Valued Variable
You might reasonably wonder about using Calc HS here. After all, it assumes
only the two values 0 and 1. Does the correlation between Calc and Calc HS
make sense? The positive correlation of 0.324 indicates that if the student
has taken calculus in high school, the student is more likely to have a high
calculus grade.
Another categorical variable in this correlation matrix is Gender Code,
which has a signifi cant negative correlation with the Alg2 Grade (r 520 .44 6 ,
p value 50. 000 ) and HS Rank (r 520. 319 , p value 50. 00 4). Recall that
in the gender code, 0 5 female and 1 5 male. A negative correlation here
means that females tended to have higher grades in second-year algebra and
were ranked higher in high school.
Adjusting Multiple p Values with Bonferroni
The second matrix in Figure 8-24 gives the p values for the correlations.
Except for Gender, all of the correlations with Calc are signifi cant at the 5%
level because all the p values are less than 0.05.
Some statisticians believe that the p values should be adjusted for the
number of tests, because conducting several hypothesis tests raises above
5% the probability of rejecting at least one true null hypothesis. The
Bonferroni approach to this problem is to multiply the p value in each test
by the total number of tests conducted. With this approach, the probability
of rejecting one or more of the true hypotheses is less than 5%.
Let’s apply this approach to correlations of Calc with the other variables.
Because there are six correlations, the Bonferroni approach would have us
multiply each p value by 6 (equivalent to decreasing the p value required
for statistical signifi cance to 0.05/ 65 0.0083). Alg2 Grade has a p value of
0.020, and because 6 31 0.020 25 0.120, the correlation is no longer signifi -
cant from this point of view. Instead of focusing on the individual correlation