Statistical Methods for Psychology

(Michael S) #1

Testing the Significance of R^2


We have seen how to ask whether each of the variables is making a significant contribution
to the prediction of Yby testing its regression coefficient ( ). But perhaps a question that
should be asked first is, “Does the set of variables taken together predict Yat better-than-
chance levels?” I suggest that this question has priority because there is little point in look-
ing at individual variables if no overall relationship exists.
The easiest way to test the overall relationship between Yand is to test
the multiple correlation coefficient for statistical significance. This amounts to testing
, where represents the correlation coefficient in the population. By the na-
ture of our test, it is actually easier to test than R, but that amounts to the same thing.
The test on is recognizable as a simple extension of the test given in Chapter 9 when
we had only one predictor. In this case we have ppredictors and

is distributed as the standard Fdistribution on pand N 2 p 2 1 degrees of freedom. (With
only one predictor this Fstatistic reduces to the familiar .) For our data,
N 5 50, p 5 5, and 5 .886. Then

4

This is the same Fas that given in the summary table in Exhibit 15.1. An Fof 182.64 on 2
and 47 dfis obviously significant beyond p 5 .05, and we can therefore reject
and conclude that we can predict at better-than-chance levels. (The printout shows the
probability associated with this Funder to 3 decimal places as 0.000.)

Sample Sizes


As you can tell from the formula for an adjusted Rsquare and from the preceding formula
for F, our estimate of the correlation depends on both the size of the sample (N) and the
number of predictors (p). People often assume that if there is no relation between the crite-
rion and the predictors, Rshould come out near 0. In fact, the expected value of R for
random datais p/(N 2 1).
Thus, with 2 predictors, 50 cases, and no true relationship between the predictors and
the criterion, an R 5 .04 would be the expected value, not 0. So it is important that we have
a relatively large sample size. A rule of thumb that has been kicking around for years is that
we should have at least10 observations for every predictor. Harris (1985) points out, how-
ever, that he knows of no empirical evidence supporting this rule. It certainly fails in the
extreme, because no one would be satisfied with 10 observations and 1 predictor. Harris
advocates an alternative rule dealing not with the ratio of pto N, but with their difference.
His rule is that Nshould exceed pby at least 50. Others have suggested the slightly more
liberal N$p 1 40. Whereas these two rules relate directly to the reliability of a correla-
tion coefficient, Cohen, Cohen, West, and Aiken (2003) approach the problem from the di-
rection of statistical power. They show that in the one-predictor case, to have power 5 .80
for a population correlation of .30 would require N 5 124. With 5 predictors, a population

H 0


H 0 :R*= 0


F=


(50 222 1)(.886)


2(.114)


=


47(.886)


.228


=182.64


R^2


(N 2 2)(r^2 )>(1 2 r^2 )

F=


(N 2 p 2 1)R^2
p(1 2 R^2 )

R^2


R^2


H 0 :R= 0 R


X 1 , X 2 ,... , Xp

bj

15.6 The Multiple Correlation Coefficient 533

(^4) Here, as elsewhere, what you might calculate with a calculator will differ from the answers I give because of
rounding error. Computer software uses far more significant digits than it prints out, and the answers are
themselves more accurate.

Free download pdf