Descriptive Statistics 341
This problem is apparent from the following illustration. Suppose we
have two variables, x and y, with a cov(x, y) of a certain value. A linear trans-
formation of, at least, one variable, say ax + b, will generally lead to a change
in value of the covariance due to the following property of the covariance:
cov(ax+=by,)axcov( ,)y
This does not mean, however, that the transformed variable is more or less
correlated with y than x was. Since the covariance is obviously sensitive to
transformation, it is not a reasonable measure to express the degree of cor-
relation.
This shortcoming of the covariance can be circumvented by dividing the
joint variation as defined by equation (A.11) by the product of the respective
variations of the component variables. The resulting measure is the Pearson
correlation coefficient or simply the correlation coefficient defined by
r
xy
ss
xy
xy
,
cov( ,)
=
⋅
(A.12)
where the covariance is divided by the product of the standard deviations of
x and y. By definition, rx,y can take on any value from –1 to 1 for any bivari-
ate quantitative data. Hence, we can compare different data with respect to
the correlation coefficient equation (A.12). Generally, we make the distinc-
tion rx,y < 0, negative correlation; rx,y = 0, no correlation; and rx,y > 0, posi-
tive correlation to indicate the possible direction of joint behavior.
In contrast to the covariance, the correlation coefficient is invariant with
respect to linear transformation. That is, it is said to be scaling invariant. For
example, if we translate x to ax + b, we still have
(^) raax++by, =+cov(xb,)ys/(axby⋅sa)c= ov(,xy)/asxxy⋅sr= xy,
contingency coefficient
So far, we could only determine the correlation of quantitative data. To
extend this analysis to any type of data, we introduce another measure, the
so-called chi-square test statistic denoted by χ^2. Using relative frequencies,
the chi-square test statistic is defined by
χ= ∑∑
−
==
n
fvwfvfw
fvfw
((,)()())
()()
xy ijxiyj
j xiyj
s
i
r
2 ,
2
11
(A.13)
An analogous formula can be used for absolute frequencies.