IV. Covariance and
Correlation
Covariance and correlation are
measures of relationships between
variables.
Covariance
Population:
covðX;YÞ¼E½ðXmxÞðYmyÞ
Sample:
covdðX;YÞ
¼
1
ðn 1 Þ
~
n
i¼ 1
ðXiXÞðYiYÞ
Correlation
Population:rxy¼
covðX;YÞ
sxsy
Sample:rxy¼
covdðX;YÞ
sxsy
Correlation:
Standardized covariance
Scale free
X 1 ¼height
(in feet)
cov (X 2 ,Y)
¼12 cov(X 1 ,Y)
X 2 ¼height
(in inches)
BUT
Y¼weight rx 2 y¼rx 1 y
In the sections that follow, we provide an over-
view of the mathematical foundation of the
GEE approach. We begin by developing some
of the ideas that underlie correlated analyses,
including covariance and correlation.
Covariance and correlation are measures that
express relationships between two variables.
ThecovarianceofXandYin a population is
defined as the expected value, or average, of the
product ofXminus its mean (mx) andYminus
its mean (my). With sample data, the covariance
is estimated using the formula on the left,
whereXandYare sample means in a sample
of sizen.
ThecorrelationofXandY in a population,
often denoted by the Greek letter rho (r), is
defined as the covariance ofXandYdivided
by the product of the standard deviation ofX
(i.e.,sx) and the standard deviation ofY(i.e.,
sy). The corresponding sample correlation,
usually denoted asrxy, is calculated by dividing
the sample covariance by the product of the
sample standard deviations (i.e.,sxandsy).
The correlation is a standardized measure of
covariance in which the units ofXandYare the
standard deviations ofXandY, respectively.
The actual units used for the value of variables
affect measures of covariance but not mea-
sures of correlation, which are scale-free. For
example, the covariance between height and
weight will increase by a factor of 12 if the
measure of height is converted from feet to
inches, but the correlation between height
and weight will remain unchanged.
500 14. Logistic Regression for Correlated Data: GEE