Wine Chemistry and Biochemistry

(Steven Felgate) #1

13 Statistical Techniques for the Interpretation of Analytical Data 701


areLinear Discriminant Analysis(LDA),Quadratic Discriminant Analysis(QDA),


SIMCA (Soft Independent Modelling Class Analogy) and kNN (k-Nearest Neigh-


bour) methods. Neural Networks correspond to another recent method to classify


samples into one of the knownk groups. Also, to achievemaximum graphical


differentiationof thekgroups,Canonical Variate Analysis(CVA) can be used,


andMultivariate Analysis of Variance(MANOVA) to test for differences between


the groups. Descriptive values of the variables in the groups will permit their


characterization.


13.3.3.1 Discriminant Analysis


This supervised classification method, which is the most used, accepts a nor-


mal multivariate distribution for the variables in each population ((X 1 ,...,Xp)∼


N(
−→
μi,i) ), and calculates the classification functions minimising the possibility

of incorrect classification of the observations of the training group (Bayesian type


rule). If multivariate normality is accepted and equality of thekcovariance matri-


ces ((X 1 ,...,Xp)∼N(
−→
μi,)),Linear Discriminant Analysis(LDA) calculates


klinear classification functions, one for each group,


{


di=ci+


∑p


j= 1

ai,jXj


}


i= 1 ,...,k

,


that would permit samples of the training group to be classified according to


theassignation rule: the sample is assigned to the group with the highest score


(“(x 1 ,x 2 ,...,xp)∈Wiif di(x 1 ,x 2 ,...,xp)=max


{
dj(x 1 ,x 2 ,...,xp)

}
j= 1 ,...,k”).
If thepvariables have a high discriminant power the percentage of correct classifi-


cation will be high, and the assignation rule can be applied to new samples. The most
important results are: classification functions, classification ofnsamples, the pos-


terior probabilities ({edi/



j

edj}i= 1 , 2 ,...,k), the classification matrix with the correct


percentage of assignment of the samples for validation purposes, and classification


of the samples in the test set. The leave-one-out cross-validation procedure can also


be used to validate the classification process. TheStepwise Linear Discriminant


Analysis(SLDA) provides these same results, but using less variables, selecting in


each step the variable that most favours discrimination of thekgroups. If the covari-
ance matrices are unequal ((X 1 ,...,Xp)∼N(−→μi,i)),Quadratic Discriminant


Analysis(QDA) can be used to obtainquadratic functions to classify the samples.


13.3.3.2 SIMCA Method


The SIMCA method defines a factorial model withaiprincipal components for


each of the{ kgroups, starting with the corresponding matrix of standardised data,


X
∗(i)
(ni,p)=F

(i)
(ni,ai)B

(i)
(ai,p)+E

(i)
(ni,p)

}


i= 1 ,...,k

, and using thesekmodels to assign the


samples to each of the groups. The observation−→w is assignedin relation to its


degree of fit to each model, comparing theerror of fit to each class with the mean fit


error of the observations of the class. The results include the table for classification


of observations and the graphical representation of thedegree of fitof the samples


to each pair of classes, known asCoomans plot.

Free download pdf