STATISTICS
31.4.6 Population covarianceCov[x, y]and correlationCorr[x, y]
So far we have assumed that each of ourNindependent samples consists of
a single numberxi. Let us now extend our discussion to a situation in which
each sample consists of two numbersxi,yi, which we may consider as being
drawn randomly from a two-dimensional populationP(x, y). In particular, we
now consider estimators for the population covariance Cov[x, y] and for the
correlation Corr[x, y].
Whenμxandμyareknown, an appropriate estimator of the population covari-
ance is
Cov[̂x, y]=xy−μxμy=
(
1
N
∑N
i=1
xiyi
)
−μxμy. (31.59)
This estimator is unbiased since
E
[
Cov[̂x, y]
]
=
1
N
E
[N
∑
i=1
xiyi
]
−μxμy=E[xiyi]−μxμy=Cov[x, y].
Alternatively, ifμxandμyareunknown, it is natural to replaceμxandμyin
(31.59) by the sample means ̄xand ̄yrespectively, in which case we recover the
sample covarianceVxy=xy− ̄x ̄ydiscussed in subsection 31.2.4. This estimator
is biased but an unbiased estimator of the population covariance is obtained by
forming
Cov[̂x, y]= N
N− 1
Vxy. (31.60)
Calculate the expectation value of the sample covarianceVxyfor a sample of sizeN.
The sample covariance is given by
Vxy=
(
1
N
∑
i
xiyi
)
−
(
1
N
∑
i
xi
)(
1
N
∑
j
yj
)
.
Thus its expectation value is given by
E[Vxy]=
1
N
E
[
∑
i
xiyi
]
−
1
N^2
E
[(
∑
i
xi
)(
∑
j
xj
)]
=E[xiyi]−
1
N^2
E
∑
i
xiyi+
∑
i,j
j=i
xiyj