AP Statistics 2017

(Marvins-Underground-K-12) #1

or more simply, the correlation coefficient , denoted by the letter r . The correlation coefficient is a
measure of the strength of the linear relationship between two variables as well as an indicator of the
direction of the linear relationship (whether the variables are positively or negatively associated).
If we have a sample of size n of paired data, say (x,y ), and assuming that we have computed summary
statistics for x and y (means and standard deviations), the correlation coefficient r is defined as follows:


Because the terms after the summation symbol are nothing more than the z -scores of the individual x
and y values, an easy way to remember this definition is:


example: Earlier    in  the section,    we  saw some    data    for hours   studied and the corresponding   scores
on an exam. It can be shown that, for these data, r = 0.864 and the scatterplot appears roughly
linear. Together, this indicates a strong positive linear relationship between hours studied and
exam score. That is, the more hours studied, the higher the exam score.

The correlation coefficient r has a number of properties you should be familiar with:


• –1 ≤ r ≤ 1. If r = –1 or r = 1, the points all lie on a line.
• Although there are no hard-and-fast rules about how strong a correlation is based on its numerical
value, the following guidelines might help you categorize r:


• If r > 0, it indicates that the variables are positively associated. If r < 0, it indicates that the variables
are negatively associated.
• If r = 0, it indicates that there is no linear association that would allow us to predict y from x . It does
not mean that there is no relationship—just not a linear one.
• It does not matter which variable you call x and which variable you call y. r will be the same. In other
words, r depends only on the paired points, not the ordered pairs.
• r does not depend on the units of measurement. In the previous example, convert “hours studied” to
“minutes studied” and r would still equal 0.864.
• r is not resistant to extreme values because it is based on the mean. A single extreme value can have a

Free download pdf