Wine Chemistry and Biochemistry

(Steven Felgate) #1

702 P.J. Mart ́ın-Alvarez ́


13.3.3.3 kNN Method


This is anon-parametricmethod that calculates thedistances matrixbetween alln


observations and uses the following assignation rule: “sample−→wis assigned to the


group most represented among the nearestkobservations”. Generallykis odd, and


the size of the groups is also taken into account.


13.3.3.4 Canonical Variate Analysis(CVA)


To obtain the bestq-dimensional (q =min(p,k−1)) graphical representation


of the observations of the training group, by maximizing the differences between


kgroups,qnew variables are obtained (Y 1 , ...,Yq), calledcanonical variables,


which are a linear combination of the original ones, incorrelated and maximizing


the differences between the groups by the transformationY(n,q)=X(n,p)A(p,q).The


following results are obtained by applying this technique: the transformation matrix


(A(p,q)), the coordinates of the observations in the canonical variables (Y(n,q))for


their graphical representation, the coordinates of the centroids for each group, and


the structure matrix with the correlations between the canonical and original vari-


ables. The distances to the centroids could be used to assign new samples.


13.3.3.5 MANOVA


Assuming a normal multivariate distribution, with the same covariance matrices,


in each of the populations, (X 1 ,X 2 ,...,Xp)N(
−→
μi,), themultivariate analysis


of variance(MANOVA) for a single factor withklevels (extension of the single
factor ANOVA to the case of pvariables), permits the equality of thek mean


vectors inpvariables to be tested (H 0 ≡ −→μ 1 =−→μ 2 = ...= −→μk), where
−→μ′
i =( ̄μ


i
1 ,μ ̄

i
2 ,...,μ ̄

i
p) is the mean vector ofpvariables in populationWi.The
statistic used in the comparison is theof Wilks, the value of which can be esti-


mated by another statistic with F-distribution. If the calculated value is greater than


the tabulated value, the null hypothesis for equality of thekmean vectors must be


rejected. To establish whether the variables can distinguish each pair of groups a


statistic is used with the F-distribution withpandn−p−k+1df,basedonthe


square of Mahalanobis’ distance between the centroids, that permits the equality of


the pairs of mean vectors to be compared (H 0 ≡−→μi=−→μj) (Afifi and Azen 1979;


Mart ́ın-Alvarez 2000). ́


We have used SLDA to select the variables most useful to differentiate groups


of wine samples (Cabezudo et al. 1986; Mart ́ın-Alvarez et al. 1987, 1988; Pozo- ́


Bay ́on et al. 2001, 2003a,b; Monagas et al. 2007) to distinguish between grape,


apple and pineapple juice (Dizy et al. 1992), to classify commercial orange juices


(Sim ́o et al. 2004), and to differentiate whisky brands (Herranz et al. 1989). SIMCA


and kNN were used to differentiate Spanish wines (Mart ́ın-Alvarez et al. 1987) ́


and whisky brands (Mart ́ın-Alvarez et al. 1988). LDA, QDA, CVA and kNN were ́


used to differentiate gin brands (Mart ́ın-Alvarez and Herranz 1991) and rum brands


(Herranz et al. 1990). As an example, Tables 13.16 to 13.22 show the results of

Free download pdf