Wine Chemistry and Biochemistry

13 Statistical Techniques for the Interpretation of Analytical Data 695

nal, it is also possible to consider the equationX(n,p)=Y(n,q)At(q,p)that corresponds

to thefactorial model of principal components.

The results of PCA are (1) the numberqof principal components, (2) the matrix

Y(n,q)=X(n,p)A(p,q)=(yi,j), with the scores ofnobservations in theqnew vari-

ables (Y 1 ,Y 2 , ...,Yq), and (3) the matrixA(p,q) =(ai,j) with information about

the contribution of the original variables (X 1 ,X 2 , ...,Xp) in the definition of theq

components. The coefficients (ai,j) are usually transformed so that they correspond

with the correlations between the principal components and the original variables

and to help to know the variables that define the principal components. The bidi-

mensional representation of thenobservations in the plane defined by the first

two principal components is usually usedto explore or confirm possible clusters

of observationsand todetect possible outliers.

13.3.2.2 Factor Analysis (FA)

FA is a more generalmethod for describing the dependence structure of the p

variables(X 1 ,X 2 ,...,Xp)from other q non-observed variables,called factors,

that we accept to be responsible for the original ones, and manages to reduce the

dimension of the data (ifqvery much minor thatp). Theorthogonal factorial

modelaccepts for each original standardised variable (X∗i), the following model:

X∗i = b 1 ,iF 1 +b 2 ,iF 2 +...+bq,iFq +εi,where(F 1 ,F 2 ,...,Fq)aretheq

common factors, that are incorrelated; (ε 1 ,ε 2 ,...,εp) are the specific factors of

each variableXi, that are also incorrelated; andbi,jtheloadingsof the factors. It

is also accepted that the common and specific factors are independent and have
a mean of 0 and a variance of one. Thefactorial model,in matricial form, is

X∗(n,p)=F(n,q)B(q,p)+E(n,p),whereX(∗n,p)is the matrix with standardised observa-

tions,F(n,q)is the matrix with the coordinates of the observations inqfactors,B(q,p)

is the matrix with the loadings of the factors in theporiginal variables, andE(n,p)is

the matrix of the model errors. The number of factors (q), the scores’ matrix (F(n,q)

), and the loadings’ matrix (B(q,p)), are the results of FA. The graphical represen-

tation of the observations in the plane defined by the first two factors informs of

the possible clusters of observations and of the presence or not of outliers. From a

geometrical perspective, the aim is to find the subspace that best fits thenpoints in

space of the variables (X 1 ,X 2 ,...,Xp), to minimise the sum of the modules of the

nvectors row of the matrixE(n,p).

Thefactorial model in principal components X∗(n,p)=Y(n,q)At(q,p)+E(n,p),which

is frequently used, considersprincipal components as factors(F(n,q)=Y(n,q)), and

the transposed components matrix as the saturations matrix (B(q,p)=At(q,p)), and

this fulfils all the previous requirements sinceA(p,p)is an orthogonal matrix. Calcu-

lation of the number of factors (components) can be carried out from the eigenvalues

or by means of a cross-validation procedure (Brereton 1990; Cela 1994).

Sometimes, for a better definition of the contribution of factors in the variables, it

is possible to rotate theqfactors extracted (B(∗q,p)=B(q,p)Q(p,p)), and theVa r i m a x

rotationis the most frequently used (Afifi and Azen 1979).

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources