Factor Analysis and Principal Components Analysis 259
For example, if we estimate a linear regression using MATLAB’s regress
function
XP 11 =+ααCP 52 CP 53 ++ααCP 54 C 8 +ε
we obtain the following estimate for the regression coefficients:
α
α
α
α
=
–0. 0951
–0. 3155
–0. 6800
0.2347
1
2
3
4
If we look at the matrix V, we see that the estimated regression coefficients
are equal to VV15 16 17 18,,VV, as in equation (12.15).
The eight panels in Figure 12.2 illustrate the approximation obtained
with four principal components.
the process of pCa
The example we have provided in this section can be generalized to any set
of data. We can therefore establish the following general process for PCA.
Given a set of data formed by time series of the same length, with nonsingu-
lar covariance matrix, PCA involves the following steps:
- Compute the covariance matrix ΣX of the data.
- Compute the eigenvalues D and eigenvectors V of the covariance matrix ΣX.
- Compute the principal components PC multiplying data by the eigen-
vectors: PC = X V. - Look at how eigenvalues decay (i.e., look at the plot of their magnitude).
- Choose a (small) number of PCs corresponding to the largest eigenvalues.
- Represent data approximately as weighted sums of these PCs.
Differences between Factor Analysis and PCA
There are clearly similarities between factor analysis and PCA. In both
cases, data are parsimoniously represented as a weighted sum of a (generally
small) number of factors or principal components. But there are also three
important differences that we can summarize as follows:
- PCA is a data-reduction technique; that is, it is a parsimonious represen-
tation that can be applied to any data set with a nonsingular covariance