Wine Chemistry and Biochemistry

702 P.J. Mart ́ın-Alvarez ́

13.3.3.3 kNN Method

This is anon-parametricmethod that calculates thedistances matrixbetween alln

observations and uses the following assignation rule: “sample−→wis assigned to the

group most represented among the nearestkobservations”. Generallykis odd, and

the size of the groups is also taken into account.

13.3.3.4 Canonical Variate Analysis(CVA)

To obtain the bestq-dimensional (q =min(p,k−1)) graphical representation

of the observations of the training group, by maximizing the differences between

kgroups,qnew variables are obtained (Y 1 , ...,Yq), calledcanonical variables,

which are a linear combination of the original ones, incorrelated and maximizing

the differences between the groups by the transformationY(n,q)=X(n,p)A(p,q).The

following results are obtained by applying this technique: the transformation matrix

(A(p,q)), the coordinates of the observations in the canonical variables (Y(n,q))for

their graphical representation, the coordinates of the centroids for each group, and

the structure matrix with the correlations between the canonical and original vari-

ables. The distances to the centroids could be used to assign new samples.

13.3.3.5 MANOVA

Assuming a normal multivariate distribution, with the same covariance matrices,

in each of the populations, (X 1 ,X 2 ,...,Xp)N(
−→
μi,), themultivariate analysis

of variance(MANOVA) for a single factor withklevels (extension of the single
factor ANOVA to the case of pvariables), permits the equality of thek mean

vectors inpvariables to be tested (H 0 ≡ −→μ 1 =−→μ 2 = ...= −→μk), where
−→μ′
i =( ̄μ

i 1 ,μ ̄

i 2 ,...,μ ̄

i
p) is the mean vector ofpvariables in populationWi.The
statistic used in the comparison is theof Wilks, the value of which can be esti-

mated by another statistic with F-distribution. If the calculated value is greater than

the tabulated value, the null hypothesis for equality of thekmean vectors must be

rejected. To establish whether the variables can distinguish each pair of groups a

statistic is used with the F-distribution withpandn−p−k+1df,basedonthe

square of Mahalanobis’ distance between the centroids, that permits the equality of

the pairs of mean vectors to be compared (H 0 ≡−→μi=−→μj) (Afifi and Azen 1979;

Mart ́ın-Alvarez 2000). ́

We have used SLDA to select the variables most useful to differentiate groups

of wine samples (Cabezudo et al. 1986; Mart ́ın-Alvarez et al. 1987, 1988; Pozo- ́

Bay ́on et al. 2001, 2003a,b; Monagas et al. 2007) to distinguish between grape,

apple and pineapple juice (Dizy et al. 1992), to classify commercial orange juices

(Sim ́o et al. 2004), and to differentiate whisky brands (Herranz et al. 1989). SIMCA

and kNN were used to differentiate Spanish wines (Mart ́ın-Alvarez et al. 1987) ́

and whisky brands (Mart ́ın-Alvarez et al. 1988). LDA, QDA, CVA and kNN were ́

used to differentiate gin brands (Mart ́ın-Alvarez and Herranz 1991) and rum brands

(Herranz et al. 1990). As an example, Tables 13.16 to 13.22 show the results of

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources