Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

2.6Paired Data Sets and the Sample Correlation Coefficient 35


35

28

21

14

7

Number
of defects

20 21 22 23 24 25 26 27 28 29 30 31 32

Temperature

FIGURE 2.13 A scatter diagram.


A question of interest concerning paired data sets is whether largexvalues tend to be
paired with largeyvalues, and smallxvalues with smallyvalues; if this is not the case,
then we might question whether large values of one of the variables tend to be paired
with small values of the other. A rough answer to these questions can often be provided
by the scatter diagram. For instance, Figure 2.13 indicates that there appears to be some
connection between high temperatures and large numbers of defective items. To obtain
a quantitative measure of this relationship, we now develop a statistic that attempts to
measure the degree to which largerxvalues go with largeryvalues and smallerxvalues
with smalleryvalues.
Suppose that the data set consists of the paired values (xi,yi),i=1,...,n. To obtain
a statistic that can be used to measure the association between the individual values of a
set of paired data, let ̄xand ̄ydenote the sample means of thexvalues and theyvalues,
respectively. For data pairi, considerxi− ̄xthe deviation of itsxvalue from the sample
mean, andyi− ̄ythe deviation of itsyvalue from the sample mean. Now ifxiis a large
xvalue, then it will be larger than the average value of all thex’s, so the deviationxi− ̄x
will be a positive value. Similarly, whenxiis a smallxvalue, then the deviationxi− ̄xwill

Free download pdf