694 P.J. Mart ́ın-Alvarez ́
X 1 X 2 ...Xp
Observations
1
2
...
n
⎛ ⎜ ⎜ ⎜ ⎜ ⎝
x 1 , 1 x 1 , 2 ...x 1 ,p
x 2 , 1 x 2 , 2 ...x 2 ,p
... ... ... ...
xn, 1 xn, 2 ...xn,p
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
The vectors of means
−→
x ̄′=( ̄x 1 ,x ̄ 2 ,...,x ̄p) and deviations−→s′=(s 1 ,s 2 ,...,sp),
and matrices of covariancesS=(si,j) and correlationsR=(ri,j) can be calculated.
For this data matrix, the most used non-supervised methods arePrincipal Compo-
nents Analysis(PCA), and/orFactorial Analysis(FA) in an attempt to reduce the
dimensions of the data and study the interrelation between variables and observa-
tions, andCluster Analysis(CA) to search for clusters of observations or variables
(Krzanowski 1988; Cela 1994; Afifi and Clark 1996). Before applying these tech-
niques, variables are usually first standardised (Xi→X∗i) to achieve a mean of 0
and unit variance.
13.3.2.1 Principal Components Analysis (PCA)
The main objective of this technique is toreduce the dimensions of data with-
out losing important information, starting with the correlation between variables,
to explore the relationship between variables and between observations.The aim
is to obtain pnew variables (Y 1 ,Y 2 , ...,Yp), that we will callprincipal com-
ponents, which are (1) a normalised linear combination of the original variables
(Yi = a 1 ,iX 1 +a 2 ,iX 2 +...+ap,iXp;
∑
ka
2
k,i = 1 ), (2) uncorrelated ones
(cov(Yi,Yj)= 0 ∀i=j), (3) with progressively diminishing variances (var(Y 1 )≥
var(Y 2 )≥...≥var(Yp) ), and (4) the total variance (VT) coincident with that
of the original variables (
∑p
i= 1
var(Yi)=
∑p
i= 1
var(Xi)=VT) (Afifi and Azen 1979).
The aim is, therefore, to obtain the matricial transformationY(n,p)=X(n,p)A(p,p).
If the original variables are previously standardised (most programmes do this),
the coefficients (ai,j) will be determined from the eigenvalues and eigenvectors of
the correlations’ matrixR(p,p)=(ri,j), where thej-th column (
−→
aj) of the matrix
A(p,p) =(ai,j) is the eigenvector associated with thej-th greatest eigenvalueλj,
complying withVar(Yj) =λj,
∑p
i= 1
var(Yi) =
∑p
i= 1
var(X∗i) =VT = p, with
theA(p,p)matrix being orthogonal (A−(p^1 ,p)=A(tp,p)). In many applications, if the
firstqcomponents (Y 1 ,...,Yq) can, together, explain a high percentage of the total
variance, e.g.
λ 1 +λ 2 +...+λq
VT 100%>80%, and ifqis very much minor thatp,the
dimensions of the original data will have been reduced, without having lost more
than a small proportion of non-essential data. This numberqof components usually
corresponds to the number ofλi>1. From a geometric perspective, the transfor-
mationY(n,q)=X(n,p)A(p,q)corresponds to an orthogonal rotation of the coordinate
axes in the directions of maximum variance, and since theA(p,p)matrix is orthogo-