CK-12 Probability and Statistics - Advanced

(Marvins-Underground-K-12) #1

9.4. Multiple Regression http://www.ck12.org


In this excerpt, we have a number of summary statistics that give us information about the model. As you can see
from the print out above, we have information for each variable on the regression coefficient(β), the standard error
of the regression coefficientse(β)and theR^2 value.


Using this information, we can take all of the regression coefficients and put them together to make our model. In
this example, our regression equation would beYˆ=− 121. 66 + 1. 51 X+ 12. 53 Z. Each of these coefficients tells
us something about the relationship between the predictor variable and the predicted outcome. The temperature
coefficient of 1.51 tells us that for every 1.0 degree increase in temperature, we predict there to be an increase of
1 .5 ounce of water consumed if we hold the practice time constant. Similarly, we find that with every 10 minute
increase in practice time, we predict players to consume an additional 15 ounces of water if we hold the temperature
constant.


With anR^2 of 0.99, we can conclude that approximately 99% of the variance in the outcome variable(Y)can be
explained by the variance in the combined predictor variables. Notice that the adjustedR^2 is only slightly different
from the unadjustedR^2. This is due to the relatively small number of observations and the small number of predicted
variables. With anR^2 of 0.99 we can conclude that almost all of the variance in water consumption is attributed to
the variance in temperature and practice time.


Testing for Significance to Evaluate a Hypothesis, the Standard Error of a Coeffi-


cient and Constructing Confidence Intervals


When we perform multiple regression analysis, we are essentially trying to determine if our predictor variables
explain the variation in the outcome variable(Y). When we put together our final model, we are looking at whether
or not the variables explain most of the variation(R^2 )and if thisR^2 value is statistically significant. We can
use technological tools to conduct a hypothesis test testing the significance of thisR^2 value and in constructing
confidence intervals around these results.


Hypothesis Testing


When we conduct a hypothesis test, we test the null hypothesis that the multipleRvalue in the population equals
zero(H 0 =Rpop= 0 ). Under this scenario, the predicted or fitted values would all be very close to the mean and the
deviations(Yˆ−Y ̄)or the sum of squares would be very small (close to 0). Therefore, we want to calculate a test
statistic (in this case theFstatistic) that measures the correlation between the predictor variables. If this test statistic
is beyond the critical values and the null hypothesis is rejected, we can conclude that there is a nonzero relationship
between the criterion variable(Y)and the predictor variables. When we reject the null hypothesis we can say
something to the effect of “The probability thatR^2 =X Xwould have occurred by chance if the null hypothesis were
true is less than.05 (or. 10 ,.01, etc.).” As mentioned, we can use computer programs to determine theF−statistic
and its significance.


Let’s take a look at the example above and interpret theFvalue. We see that we have a very highR^2 value of 0. 99
which means that almost all of the variance in the outcome variable (water consumption) can be explained by the
predictor variables (practice time and temperature). Our ANOVA (ANalysis Of VAriance) table tells us that we have
a calculatedFstatistic of 313.17, which has an associated probability value of 4. 03 E− 05 ( 0. 0000403 ). This means
that the probability that 0.99 of the variance would have occurred by chance if the null hypothesis were true (i.e.,
none of the variance explained) is 0.0000403. In other words, it ishighly unlikelythat this large level of explained
variance was by chance.

Free download pdf