Data Analysis with Microsoft Excel: Updated for Office 2007

(Tuis.) #1

362 Statistical Methods


Notice the Gender Code coeffi cient, 2.627, which shows the effect of gen-
der if the other variables are held constant. Because the males are coded 1
and the females are coded 0, if the regression model is true, a male student
will score 2.627 points higher than a female student, even when the back-
grounds of both students are equivalent (equivalent in terms of the predictor
variables in the model).
Whether you can trust that conclusion depends partly on whether the
coeffi cient for Gender Code is signifi cant. For that you have to determine
the precision with which the value of the coeffi cient has been determined.
You can do this by examining the estimated standard deviations of the coef-
fi cients, displayed in the Standard Error column.

t Tests for the Coeffi cients

The t Stat column shows the ratio between the coeffi cient and the standard
error. If the population coeffi cient is 0, then this has the t distribution with
degrees of freedom n 2 p 215802621573. Here n is the number of
cases (80) and p is the number of predictors (6). The next column, P value, is
the corresponding p value—the probability of a t value this large or larger in
absolute value. For example, the t value for Alg Place is 3.092, so the prob-
ability of a t this large or larger in absolute value is about .003. The coeffi -
cient is signifi cant at the 5% level because this is less than .05. In terms of
hypothesis testing, you would reject the null hypothesis that the coeffi cient
is 0 at the 5% level and accept the alternative hypothesis. This is a two-tailed
test—it rejects the null hypothesis for either large positive or large negative
values of t—so your alternative hypothesis is that the coeffi cient is not zero.
Notice that only the coeffi cients for Alg Place and Calc HS are signifi cant.
This suggests that you not devote a lot of effort to interpreting the others. In
particular, it would not be appropriate to assume from the regression that
male students perform better than equally qualifi ed female students.
The range F17:G23 indicates the 95% confi dence intervals for each of the
coeffi cients. You are 95% confi dent that having calculus in high school is
associated with an increase in the calculus score of at least 2.233 points and
not more than 12.151 points in this particular regression equation.
Is it strange that the ACT math score is nowhere near signifi cant here,
even though this test is supposed to be a strong indication of mathemat-
ics achievement? Looking back at the correlation matrix in Chapter 8, you
can see that it has correlation 0.353 with Calc, which is highly signifi cant

(^1) p 5 .001 (^2). Why is it not signifi cant here? The answer involves other vari-
ables that contain some of the same information. In using the t distribution
to test the signifi cance of the ACT Math term, you are testing whether you
can get away with deleting this term. If the other predictors can take up the
slack and provide most of its information, then the test says that this term
is not signifi cant and therefore is not needed in the model. If each of the

Free download pdf