Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

2.6Paired Data Sets and the Sample Correlation Coefficient 37


Properties ofr


  1. − 1 ≤r≤ 1

  2. If for constantsaandb, withb>0,


yi=a+bxi, i=1,...,n

thenr=1.


  1. If for constantsaandb, withb<0,


yi=a+bxi, i=1,...,n

thenr=−1.


  1. Ifris the sample correlation coefficient for the data pairsxi,yi,i=1,...,nthen it
    is also the sample correlation coefficient for the data pairs


a+bxi, c+dyi, i=1,...,n

provided thatbanddare both positive or both negative.

Property 1 says that the sample correlation coefficientris always between−1 and+1.
Property 2 says thatrwill equal+1 when there is a straight line (also called a linear) relation
between the paired data such that largeyvalues are attached to largexvalues. Property 3
says thatrwill equal−1 when the relation is linear and largeyvalues are attached to small
xvalues. Property 4 states that the value ofris unchanged when a constant is added to each
of thexvariables (or to each of theyvariables) or when eachxvariable (or eachyvariable)
is multiplied by a positive constant. This property implies thatrdoes not depend on the
dimensions chosen to measure the data. For instance, the sample correlation coefficient
between a person’s height and weight does not depend on whether the height is measured
in feet or in inches nor whether the weight is measured in pounds or in kilograms. Also, if
one of the values in the pair is temperature, then the sample correlation coefficient is the
same whether it is measured in Fahrenheit or in Celsius.
The absolute value of the sample correlation coefficientr(that is,|r|, its value without
regard to its sign) is a measure of the strength of the linear relationship between thexand
theyvalues of a data pair. A value of|r|equal to 1 means that there is a perfect linear
relation — that is, a straight line can pass through all the data points (xi,yi),i=1,...,n.
A value of|r|of around .8 means that the linear relation is relatively strong; although there
is no straight line that passes through all of the data points, there is one that is “close” to
them all. A value for|r|of around .3 means that the linear relation is relatively weak.

Free download pdf