31.4 SOME BASIC ESTIMATORS
Since the number of terms in the double sum on the RHS isN(N−1), we have
E[Vxy]=E[xiyi]−
1
N^2
(NE[xiyi]+N(N−1)E[xiyj])
=E[xiyi]−
1
N^2
(NE[xiyi]+N(N−1)E[xi]E[yj])
=E[xiyi]−
1
N
(
E[xiyi]+(N−1)μxμy
)
=
N− 1
N
Cov[x, y],
where we have used the fact that, since the samples are independent,E[xiyj]=E[xi]E[yj].
It is possible to obtain expressions for the variances of the estimators (31.59)
and (31.60) but these quantities depend upon higher moments of the population
P(x, y) and are extremely lengthy to calculate.
Whether the meansμxandμyare known or unknown, an estimator of the
population correlation Corr[x, y] is given by
Corr[̂x, y]=
Cov[̂x, y]
σˆxσˆy
, (31.61)
whereCov[̂x, y],σˆxandσˆyare the appropriate estimators of the population co-
variance and standard deviations. Although this estimator is only asymptotically
unbiased, i.e. for largeN, it is widely used because of its simplicity. Once again
the variance of the estimator depends on the higher moments ofP(x, y)andis
difficult to calculate.
In the case in which the meansμxandμyare unknown, a suitable (but biased)
estimator is
̂Corr[x, y]= N
N− 1
Vxy
sxsy
=
N
N− 1
rxy, (31.62)
wheresxandsyare the sample standard deviations of thexiandyirespectively
andrxyis the sample correlation. In the special case when the parent population
P(x, y) is Gaussian, it may be shown that, ifρ= Corr[x, y],
E[rxy]=ρ−
ρ(1−ρ^2 )
2 N
+O(N−^2 ), (31.63)
V[rxy]=
1
N
(1−ρ^2 )^2 +O(N−^2 ), (31.64)
from which the expectation value and variance of the estimatorCorr[̂x, y]may
be found immediately.
We note finally that our discussion may be extended, without significant al-
teration, to the general case in which each data item consists ofnnumbers
xi,yi,...,zi.