Wine Chemistry and Biochemistry

13 Statistical Techniques for the Interpretation of Analytical Data 701

areLinear Discriminant Analysis(LDA),Quadratic Discriminant Analysis(QDA),

SIMCA (Soft Independent Modelling Class Analogy) and kNN (k-Nearest Neigh-

bour) methods. Neural Networks correspond to another recent method to classify

samples into one of the knownk groups. Also, to achievemaximum graphical

differentiationof thekgroups,Canonical Variate Analysis(CVA) can be used,

andMultivariate Analysis of Variance(MANOVA) to test for differences between

the groups. Descriptive values of the variables in the groups will permit their

characterization.

13.3.3.1 Discriminant Analysis

This supervised classification method, which is the most used, accepts a nor-

mal multivariate distribution for the variables in each population ((X 1 ,...,Xp)∼

N( −→ μi,i) ), and calculates the classification functions minimising the possibility

of incorrect classification of the observations of the training group (Bayesian type

rule). If multivariate normality is accepted and equality of thekcovariance matri-

ces ((X 1 ,...,Xp)∼N(
−→
μi,)),Linear Discriminant Analysis(LDA) calculates

klinear classification functions, one for each group,

{

di=ci+

∑p

j= 1

ai,jXj

}

i= 1 ,...,k

,

that would permit samples of the training group to be classified according to

theassignation rule: the sample is assigned to the group with the highest score

(“(x 1 ,x 2 ,...,xp)∈Wiif di(x 1 ,x 2 ,...,xp)=max

{ dj(x 1 ,x 2 ,...,xp)

}
j= 1 ,...,k”).
If thepvariables have a high discriminant power the percentage of correct classifi-

cation will be high, and the assignation rule can be applied to new samples. The most
important results are: classification functions, classification ofnsamples, the pos-

terior probabilities ({edi/

∑

j

edj}i= 1 , 2 ,...,k), the classification matrix with the correct

percentage of assignment of the samples for validation purposes, and classification

of the samples in the test set. The leave-one-out cross-validation procedure can also

be used to validate the classification process. TheStepwise Linear Discriminant

Analysis(SLDA) provides these same results, but using less variables, selecting in

each step the variable that most favours discrimination of thekgroups. If the covari-
ance matrices are unequal ((X 1 ,...,Xp)∼N(−→μi,i)),Quadratic Discriminant

Analysis(QDA) can be used to obtainquadratic functions to classify the samples.

13.3.3.2 SIMCA Method

The SIMCA method defines a factorial model withaiprincipal components for

each of the{ kgroups, starting with the corresponding matrix of standardised data,

X ∗(i) (ni,p)=F

(i) (ni,ai)B

(i) (ai,p)+E

(i) (ni,p)

}

i= 1 ,...,k

, and using thesekmodels to assign the

samples to each of the groups. The observation−→w is assignedin relation to its

degree of fit to each model, comparing theerror of fit to each class with the mean fit

error of the observations of the class. The results include the table for classification

of observations and the graphical representation of thedegree of fitof the samples

to each pair of classes, known asCoomans plot.

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources