702 P.J. Mart ́ın-Alvarez ́
13.3.3.3 kNN Method
This is anon-parametricmethod that calculates thedistances matrixbetween alln
observations and uses the following assignation rule: “sample−→wis assigned to the
group most represented among the nearestkobservations”. Generallykis odd, and
the size of the groups is also taken into account.
13.3.3.4 Canonical Variate Analysis(CVA)
To obtain the bestq-dimensional (q =min(p,k−1)) graphical representation
of the observations of the training group, by maximizing the differences between
kgroups,qnew variables are obtained (Y 1 , ...,Yq), calledcanonical variables,
which are a linear combination of the original ones, incorrelated and maximizing
the differences between the groups by the transformationY(n,q)=X(n,p)A(p,q).The
following results are obtained by applying this technique: the transformation matrix
(A(p,q)), the coordinates of the observations in the canonical variables (Y(n,q))for
their graphical representation, the coordinates of the centroids for each group, and
the structure matrix with the correlations between the canonical and original vari-
ables. The distances to the centroids could be used to assign new samples.
13.3.3.5 MANOVA
Assuming a normal multivariate distribution, with the same covariance matrices,
in each of the populations, (X 1 ,X 2 ,...,Xp)N(
−→
μi,), themultivariate analysis
of variance(MANOVA) for a single factor withklevels (extension of the single
factor ANOVA to the case of pvariables), permits the equality of thek mean
vectors inpvariables to be tested (H 0 ≡ −→μ 1 =−→μ 2 = ...= −→μk), where
−→μ′
i =( ̄μ
i
1 ,μ ̄
i
2 ,...,μ ̄
i
p) is the mean vector ofpvariables in populationWi.The
statistic used in the comparison is theof Wilks, the value of which can be esti-
mated by another statistic with F-distribution. If the calculated value is greater than
the tabulated value, the null hypothesis for equality of thekmean vectors must be
rejected. To establish whether the variables can distinguish each pair of groups a
statistic is used with the F-distribution withpandn−p−k+1df,basedonthe
square of Mahalanobis’ distance between the centroids, that permits the equality of
the pairs of mean vectors to be compared (H 0 ≡−→μi=−→μj) (Afifi and Azen 1979;
Mart ́ın-Alvarez 2000). ́
We have used SLDA to select the variables most useful to differentiate groups
of wine samples (Cabezudo et al. 1986; Mart ́ın-Alvarez et al. 1987, 1988; Pozo- ́
Bay ́on et al. 2001, 2003a,b; Monagas et al. 2007) to distinguish between grape,
apple and pineapple juice (Dizy et al. 1992), to classify commercial orange juices
(Sim ́o et al. 2004), and to differentiate whisky brands (Herranz et al. 1989). SIMCA
and kNN were used to differentiate Spanish wines (Mart ́ın-Alvarez et al. 1987) ́
and whisky brands (Mart ́ın-Alvarez et al. 1988). LDA, QDA, CVA and kNN were ́
used to differentiate gin brands (Mart ́ın-Alvarez and Herranz 1991) and rum brands
(Herranz et al. 1990). As an example, Tables 13.16 to 13.22 show the results of