AP Statistics 2017

(Marvins-Underground-K-12) #1

For these data, it turns out that r = .999986.
Is the increase in number of ministers responsible for the increase in imported rum? Some cynics
might want to believe so, but the real reason is that the population was increasing from 1860 to 1940, so
the area needed more ministers, and more people drank more rum.
In this example, there was a lurking variable , increasing population—one we didn’t consider when
we did the correlation—that caused both of these variables to change the way they did. We will look
more at lurking variables in the next chapter, but in the meantime remember, always remember, that
correlation does not imply causation .


Lines of Best Fit


When we discussed correlation, we learned that it didn’t matter which variable we called x and which
variable we called y —the correlation r is the same. That is, there is no explanatory and response
variable, just two variables that may or may not vary linearly. In this section we will be more interested
in predicting, once we’ve determined the strength of the linear relationship between the two variables, the
value of one variable (the response) based on the value of the other variable (the explanatory). In this
situation, called linear regression, it matters greatly which variable we call x and which one we call y .


Least-Squares Regression Line


Recall again the data from the study that looked at hours studied versus score on test:

Free download pdf