Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

36 Chapter 2:Descriptive Statistics


be a negative value. Because the same statements are true about theydeviations, we can
conclude the following:


When large values of thexvariable tend to be associated with large values
of theyvariable and small values of thexvariable tend to be associated
with small values of theyvariable, then the signs, either positive or
negative, ofxi− ̄xandyi− ̄ywill tend to be the same.
Now, ifxi− ̄xandyi− ̄yboth have the same sign (either positive or negative), then
their product (xi− ̄x)(yi− ̄y) will be positive. Thus, it follows that when largexvalues
tend to be associated with largeyvalues and smallxvalues are associated with smally
values, then


∑n
i= 1 (xi− ̄x)(yi− ̄y) will tend to be a large positive number. [In fact, not
only will all the products have a positive sign when large (small)xvalues are paired with
large (small)yvalues, but it also follows from a mathematical result known as Hardy’s
lemma that the largest possible value of the sum of paired products will be obtained when
the largestxi− ̄xis paired with the largestyi− ̄y, the second largestxi− ̄xis paired with
the second largestyi− ̄y, and so on.] In addition, it similarly follows that when large values
ofxitend to be paired with small values ofyithen the signs ofxi− ̄xandyi− ̄ywill be
opposite and so


∑n
i= 1 (xi− ̄x)(yi− ̄y) will be a large negative number.
To determine what it means for

∑n
i= 1 (xi− ̄x)(yi− ̄y) to be “large,” we standardize
this sum first by dividing byn−1 and then by dividing by the product of the two sample
standard deviations. The resulting statistic is called thesample correlation coefficient.


Definition

Letsxandsydenote, respectively, the sample standard deviations of thexvalues and the
yvalues. Thesample correlation coefficient,call itr, of the data pairs (xi,yi),i=1,...,n
is defined by


r=

∑n
i= 1

(xi− ̄x)(yi− ̄y)

(n−1)sxsy

=

∑n
i= 1

(xi− ̄x)(yi− ̄y)

∑n
i= 1

(xi− ̄x)^2

∑n
i= 1

(yi− ̄y)^2

Whenr>0 we say that the sample data pairs arepositively correlated, and whenr<0we
say that they arenegatively correlated.


The following are properties of the sample correlation coefficient.
Free download pdf