342 The Basics of financial economeTrics
The intuition behind equation (A.13) is to measure the average squared
deviations of the joint frequencies from what they would be in case of inde-
pendence. When the components are, in fact, independent, then the chi-square
test statistic is zero. However, in any other case, we have the problem that,
again, we cannot make an unambiguous statement to compare different
data sets. The values of the chi-square test statistic depend on the data size n.
For increasing n, the statistic can grow beyond any bound such that there is
no theoretical maximum. The solution to this problem is given by the Pearson
contingency coefficient or simply contingency coefficient defined by
=
χ
+χ
C
n
2
2 (A.14)
The contingency coefficient by the definition given in equation (A.14) is
such that 0 ≤ C < 1. Consequently, it assumes values that are strictly less
than one but may become arbitrarily close to one. This is still not satisfac-
tory for our purpose to design a measure that can uniquely determine the
respective degrees of dependence of different data sets.
There is another coefficient that can be used based on the following. Sup-
pose we have bivariate data in which the value set of the first component
variable contains r different values and the value set of the second component
variable contains s different values. In the extreme case of total dependence of
x and y, each variable will assume a certain value if and only if the other vari-
able assumes a particular corresponding value. Hence, we have k = min{r,s}
unique pairs that occur with positive frequency whereas any other combina-
tion does not occur at all (i.e., has zero frequency). Then one can show that
(^) C= k−
k
(^1)
such that, generally, 0(≤≤Ck−<1)/1k. Now, the standardized coefficient
can be given by
=
−
C
k
k
C
1
corr^ (A.15)
which is called the corrected contingency coefficient with 0 ≤ C ≤ 1. With the
measures given in equations (A.13), (A.14), and (A.15), and the corrected
contingency coefficient, we can determine the degree of dependence for any
type of data.