Wine Chemistry and Biochemistry

(Steven Felgate) #1

13 Statistical Techniques for the Interpretation of Analytical Data 695


nal, it is also possible to consider the equationX(n,p)=Y(n,q)At(q,p)that corresponds


to thefactorial model of principal components.


The results of PCA are (1) the numberqof principal components, (2) the matrix


Y(n,q)=X(n,p)A(p,q)=(yi,j), with the scores ofnobservations in theqnew vari-


ables (Y 1 ,Y 2 , ...,Yq), and (3) the matrixA(p,q) =(ai,j) with information about


the contribution of the original variables (X 1 ,X 2 , ...,Xp) in the definition of theq


components. The coefficients (ai,j) are usually transformed so that they correspond


with the correlations between the principal components and the original variables


and to help to know the variables that define the principal components. The bidi-


mensional representation of thenobservations in the plane defined by the first


two principal components is usually usedto explore or confirm possible clusters


of observationsand todetect possible outliers.


13.3.2.2 Factor Analysis (FA)


FA is a more generalmethod for describing the dependence structure of the p


variables(X 1 ,X 2 ,...,Xp)from other q non-observed variables,called factors,


that we accept to be responsible for the original ones, and manages to reduce the


dimension of the data (ifqvery much minor thatp). Theorthogonal factorial


modelaccepts for each original standardised variable (X∗i), the following model:


X∗i = b 1 ,iF 1 +b 2 ,iF 2 +...+bq,iFq +εi,where(F 1 ,F 2 ,...,Fq)aretheq


common factors, that are incorrelated; (ε 1 ,ε 2 ,...,εp) are the specific factors of


each variableXi, that are also incorrelated; andbi,jtheloadingsof the factors. It


is also accepted that the common and specific factors are independent and have
a mean of 0 and a variance of one. Thefactorial model,in matricial form, is


X∗(n,p)=F(n,q)B(q,p)+E(n,p),whereX(∗n,p)is the matrix with standardised observa-


tions,F(n,q)is the matrix with the coordinates of the observations inqfactors,B(q,p)


is the matrix with the loadings of the factors in theporiginal variables, andE(n,p)is


the matrix of the model errors. The number of factors (q), the scores’ matrix (F(n,q)


), and the loadings’ matrix (B(q,p)), are the results of FA. The graphical represen-


tation of the observations in the plane defined by the first two factors informs of


the possible clusters of observations and of the presence or not of outliers. From a


geometrical perspective, the aim is to find the subspace that best fits thenpoints in


space of the variables (X 1 ,X 2 ,...,Xp), to minimise the sum of the modules of the


nvectors row of the matrixE(n,p).


Thefactorial model in principal components X∗(n,p)=Y(n,q)At(q,p)+E(n,p),which


is frequently used, considersprincipal components as factors(F(n,q)=Y(n,q)), and


the transposed components matrix as the saturations matrix (B(q,p)=At(q,p)), and


this fulfils all the previous requirements sinceA(p,p)is an orthogonal matrix. Calcu-


lation of the number of factors (components) can be carried out from the eigenvalues


or by means of a cross-validation procedure (Brereton 1990; Cela 1994).


Sometimes, for a better definition of the contribution of factors in the variables, it


is possible to rotate theqfactors extracted (B(∗q,p)=B(q,p)Q(p,p)), and theVa r i m a x


rotationis the most frequently used (Afifi and Azen 1979).

Free download pdf