Factor Analysis and Principal Components Analysis 257
we can represent the variance of the ith data time series Xi as a weighted
sum of the eigenvalues (each eigenvalue is equal to the variance of the rela-
tive principal component) as follows:
var()
var()
XVVV
XVii
1111 212818
11
=+++
=+
λλ λ
λλ
222 88
8181 282888
VV
XVVV
ii++
=+++
λ
var()λλ λ
(12.17)
Step 5: Using Only principal Components with Largest Variances From equation
(12.16), we see that in our illustration there are more than two orders of
magnitude (>100) between the smallest and the largest eigenvalues, and that
there is a rapid decay of the magnitude of eigenvalues after the first three
eigenvalues. Therefore, we can represent data approximately using only a
reduced number of principal components that have the largest variance.
Equivalently, this means using only those principal components that cor-
respond to the largest eigenvalues.
Suppose we use only four principal components. We can write the fol-
lowing approximate representation:
XPCV PCV
XPCV PCV
XPC
ii i
1515 818
55 88
8
≈++
≈++
≈
558 VP 58 ++ CV 88
(12.18)
or
XPCV PCVe
XPiiCV PCVeii
1515 8181
55 88
=+++
=+++
XP 85 =+CV 85 ++PC 88 Ve 88
(12.19)
where e represents the approximation error. The error terms are linear
combinations of the first four principal components. Therefore, they are
orthogonal to the last four principal components but, in general, they will
be mutually correlated. To see this point, consider, for example,
XP 15 =+CV 15 ++PC 81 Ve 81 and XP 85 =+CV 85 ++PC 88 Ve 88