Wine Chemistry and Biochemistry

694 P.J. Mart ́ın-Alvarez ́

X 1 X 2 ...Xp

Observations

1

2

...

n

⎛ ⎜ ⎜ ⎜ ⎜ ⎝ x 1 , 1 x 1 , 2 ...x 1 ,p

x 2 , 1 x 2 , 2 ...x 2 ,p

... ... ... ...

xn, 1 xn, 2 ...xn,p

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

The vectors of means

−→ x ̄′=( ̄x 1 ,x ̄ 2 ,...,x ̄p) and deviations−→s′=(s 1 ,s 2 ,...,sp),

and matrices of covariancesS=(si,j) and correlationsR=(ri,j) can be calculated.

For this data matrix, the most used non-supervised methods arePrincipal Compo-

nents Analysis(PCA), and/orFactorial Analysis(FA) in an attempt to reduce the

dimensions of the data and study the interrelation between variables and observa-

tions, andCluster Analysis(CA) to search for clusters of observations or variables

(Krzanowski 1988; Cela 1994; Afifi and Clark 1996). Before applying these tech-

niques, variables are usually first standardised (Xi→X∗i) to achieve a mean of 0

and unit variance.

13.3.2.1 Principal Components Analysis (PCA)

The main objective of this technique is toreduce the dimensions of data with-

out losing important information, starting with the correlation between variables,

to explore the relationship between variables and between observations.The aim

is to obtain pnew variables (Y 1 ,Y 2 , ...,Yp), that we will callprincipal com-

ponents, which are (1) a normalised linear combination of the original variables

(Yi = a 1 ,iX 1 +a 2 ,iX 2 +...+ap,iXp;

∑ ka

2
k,i = 1 ), (2) uncorrelated ones
(cov(Yi,Yj)= 0 ∀i=j), (3) with progressively diminishing variances (var(Y 1 )≥

var(Y 2 )≥...≥var(Yp) ), and (4) the total variance (VT) coincident with that

of the original variables (

∑p

i= 1

var(Yi)=

∑p

i= 1

var(Xi)=VT) (Afifi and Azen 1979).

The aim is, therefore, to obtain the matricial transformationY(n,p)=X(n,p)A(p,p).

If the original variables are previously standardised (most programmes do this),

the coefficients (ai,j) will be determined from the eigenvalues and eigenvectors of

the correlations’ matrixR(p,p)=(ri,j), where thej-th column (
−→
aj) of the matrix
A(p,p) =(ai,j) is the eigenvector associated with thej-th greatest eigenvalueλj,

complying withVar(Yj) =λj,

∑p

i= 1

var(Yi) =

∑p

i= 1

var(X∗i) =VT = p, with

theA(p,p)matrix being orthogonal (A−(p^1 ,p)=A(tp,p)). In many applications, if the

firstqcomponents (Y 1 ,...,Yq) can, together, explain a high percentage of the total

variance, e.g.
λ 1 +λ 2 +...+λq
VT 100%>80%, and ifqis very much minor thatp,the
dimensions of the original data will have been reduced, without having lost more

than a small proportion of non-essential data. This numberqof components usually

corresponds to the number ofλi>1. From a geometric perspective, the transfor-

mationY(n,q)=X(n,p)A(p,q)corresponds to an orthogonal rotation of the coordinate

axes in the directions of maximum variance, and since theA(p,p)matrix is orthogo-

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources