AP Statistics 2017

(Marvins-Underground-K-12) #1

The scatterplot (see page 94 ) leads us to believe that the form of this relationship is linear. This, and
given r = 0.864 for these data, leads us to say that we have a strong, positive, linear association between
the variables. Suppose we wanted to predict the score of a person who studied for 2.75 hours. If we knew
we were working with a linear model—a line that seemed to fit the data well—we would feel confident
about using the equation of the line to make such a prediction. We are looking for a line of best fit . We
want to find a regression line —a line that can be used for predicting response values from explanatory
values. In this situation, we would use the regression line to predict the exam score for a person who
studied 2.75 hours.
The line we are looking for is called the least-squares regression line . We could draw a variety of
lines on our scatterplot trying to determine which has the best fit. Let ŷ be the predicted value of y for a
given value of x . Then y – ŷ represents the error in prediction. We want our line to minimize errors in
prediction, so we might first think that S(y – ŷ ) would be a good measure (y – ŷ is the actual value minus
the predicted value ). However, because our line is going to average out the errors in some fashion, we
find that S(y – ŷ ) = 0. To get around this problem, we use S(y – ŷ ) 2 . This expression will vary with


different lines and is sensitive to the fit of the line. That is, S(y – ŷ ) 2 is small when the linear fit is good


and large when it is not.
The least-squares regression line (LSRL) is the line that minimizes the sum of squared errors. If ŷ =
a + bx is the LSRL, then ŷ minimizes S(y – ŷ ) 2 .


Digression  for calculus    students    only:   It  should  be  clear   that    trying  to  find    a and   b for   the line    ŷ = a + bx
that minimizes Σ(y – ŷ )^2 is a typical calculus problem. The difference is that, since ŷ is a function of
two variables, it requires multivariable calculus to derive it. That is, you need to be beyond first-year
calculus to derive the results that follow.
Free download pdf