13 Statistical Techniques for the Interpretation of Analytical Data 695
nal, it is also possible to consider the equationX(n,p)=Y(n,q)At(q,p)that corresponds
to thefactorial model of principal components.
The results of PCA are (1) the numberqof principal components, (2) the matrix
Y(n,q)=X(n,p)A(p,q)=(yi,j), with the scores ofnobservations in theqnew vari-
ables (Y 1 ,Y 2 , ...,Yq), and (3) the matrixA(p,q) =(ai,j) with information about
the contribution of the original variables (X 1 ,X 2 , ...,Xp) in the definition of theq
components. The coefficients (ai,j) are usually transformed so that they correspond
with the correlations between the principal components and the original variables
and to help to know the variables that define the principal components. The bidi-
mensional representation of thenobservations in the plane defined by the first
two principal components is usually usedto explore or confirm possible clusters
of observationsand todetect possible outliers.
13.3.2.2 Factor Analysis (FA)
FA is a more generalmethod for describing the dependence structure of the p
variables(X 1 ,X 2 ,...,Xp)from other q non-observed variables,called factors,
that we accept to be responsible for the original ones, and manages to reduce the
dimension of the data (ifqvery much minor thatp). Theorthogonal factorial
modelaccepts for each original standardised variable (X∗i), the following model:
X∗i = b 1 ,iF 1 +b 2 ,iF 2 +...+bq,iFq +εi,where(F 1 ,F 2 ,...,Fq)aretheq
common factors, that are incorrelated; (ε 1 ,ε 2 ,...,εp) are the specific factors of
each variableXi, that are also incorrelated; andbi,jtheloadingsof the factors. It
is also accepted that the common and specific factors are independent and have
a mean of 0 and a variance of one. Thefactorial model,in matricial form, is
X∗(n,p)=F(n,q)B(q,p)+E(n,p),whereX(∗n,p)is the matrix with standardised observa-
tions,F(n,q)is the matrix with the coordinates of the observations inqfactors,B(q,p)
is the matrix with the loadings of the factors in theporiginal variables, andE(n,p)is
the matrix of the model errors. The number of factors (q), the scores’ matrix (F(n,q)
), and the loadings’ matrix (B(q,p)), are the results of FA. The graphical represen-
tation of the observations in the plane defined by the first two factors informs of
the possible clusters of observations and of the presence or not of outliers. From a
geometrical perspective, the aim is to find the subspace that best fits thenpoints in
space of the variables (X 1 ,X 2 ,...,Xp), to minimise the sum of the modules of the
nvectors row of the matrixE(n,p).
Thefactorial model in principal components X∗(n,p)=Y(n,q)At(q,p)+E(n,p),which
is frequently used, considersprincipal components as factors(F(n,q)=Y(n,q)), and
the transposed components matrix as the saturations matrix (B(q,p)=At(q,p)), and
this fulfils all the previous requirements sinceA(p,p)is an orthogonal matrix. Calcu-
lation of the number of factors (components) can be carried out from the eigenvalues
or by means of a cross-validation procedure (Brereton 1990; Cela 1994).
Sometimes, for a better definition of the contribution of factors in the variables, it
is possible to rotate theqfactors extracted (B(∗q,p)=B(q,p)Q(p,p)), and theVa r i m a x
rotationis the most frequently used (Afifi and Azen 1979).