CK-12 Probability and Statistics - Advanced

(Marvins-Underground-K-12) #1

9.3. Inferences about Regression http://www.ck12.org


z=

Y−Yˆ


sXY

Since we have a certain predicted value for every value ofX, theYvalues take on the shape of a normal distribution.
This distribution has a mean (the regression line) and a standard error which we found to be equal to 0.56. In short,
the conditional distribution is used to determine the percentage ofYvalues that are associated with a specific value
ofX.


Example:


Using our example above, if a student scored a 5 on the short test, what is the probability that they would have a
score of 5 or greater on the long physical fitness test?


Solution:


From the regression equationY=. 635 X+ 1 .22, we find that the predicted score forX=5 isY= 4 .40. Consider
the conditional distribution ofYscores forX=5. Under our assumption, this distribution is normally distributed
around the predicted value( 4. 40 )and has a standard error of 0.56.


Therefore, to find the percentage ofYscores of 5 or greater, we use the general formula and find that:


z=

Y−Yˆ


sY∗X

=


5 − 4. 40


0. 56


= 1. 07


Using thez-distribution table, we find that the area to the right of azscore of 1.07 is.1423. Therefore, we can
conclude that the proportion of predicted scores of 5 or greater given a predicted score of 5 is.1423 or 14.23%.


Confidence Intervals


Similar to hypothesis testing for samples and populations, we can also build a confidence interval around our
regression results. This helps us ask questions like “If the predictor value was equal toX, what are the likely
values forY?” This gives us a range of scores that has a certain percent probability of including the score that we are
after.


We know that the standard error of the predicted score is smaller when the predicted value is close to the actual value
and it increases asXdeviates from the mean. This means that the weaker of a predictor that the regression line is,
the larger the standard error of the predicted score will be. The standard error of a predicted score is calculated by
using the formula:


sYˆ=sY∗X


1 +


1


n

+


(X−X ̄)^2


SSx

The general formula for the confidence interval for predicted scores is found by using the following formula:


CI=Yˆ±(tcvsY)

where:

Free download pdf