CK-12 Probability and Statistics - Advanced

(Marvins-Underground-K-12) #1

http://www.ck12.org Chapter 9. Regression and Correlation


Correlation Coefficients


While examining scatterplots gives us some idea about the relationship of two variables, we use a statistic some-
thing called thecorrelation coefficientto give us a more precise measurement of the relationship between two
variables.The correlation coefficient is an index that describes the relationship between two variables and can take
on values between− 1 .0 and+ 1 .0. We can tell a lot from a correlation coefficient including:



  • A positive correlation coefficient (0. 10 , 0 .56, etc.) indicates a positive correlation.

  • A negative correlation coefficient (− 0. 32 ,− 0 .82, etc.) indicates a negative correlation.

  • The absolute value of the coefficient indicates the magnitude or the strength of the relationship. The closer the
    absolute value of the coefficient is to 1, the stronger the relationship. For example, a correlation coefficient
    of 0.20 indicates that there is not mush of a relationship between the variables while a coefficient of− 0. 90
    indicates that there is a strong linear relationship.

  • The value of a perfect positive correlation is 1.0 while the value of a perfect negative correlation is− 1 .0.

  • When there is no linear relationship between two variables, the correlation coefficient is 0.


The most often used correlation coefficient is thePearson product-moment correlation coefficient, or the linear
correlation, which is symbolized by the letterr. To understand how this coefficient is calculated, let’s suppose that
there is a positive relationship between two variables (XandY). If a subject has a score onXthat is above the mean,
we expect them to have a score onYthat is above the mean as well. Pearson developed his correlation coefficient
by computing the sum ofcross productswhich is multiplying the two scores (XandY) for each subject and then
adding these cross products across the individuals. Then, he divided this sum by the number of subjects minus one.
In short, this coefficient is the mean of the cross products of scores.


Because Pearson was measuring the difference between two variables, he used standard scores (z-scores,t-scores,
etc.) when determining the coefficient. Therefore, the formula for this coefficient is:


rxy=∑
zxzy
n− 1

In other words, the coefficient is expressed as the sum of the cross products of the standardz-scores divided by the
number of degrees of freedom.

Free download pdf