Wine Chemistry and Biochemistry

(Steven Felgate) #1

13 Statistical Techniques for the Interpretation of Analytical Data 707


13.3.4.1 Canonical Correlation Analysis (CCA)


In order to measure the association between the two sets of variables, CCA calcu-


latesmnew variables (m=min(p,q)) in each block (F 1 ,...,Fm,S 1 ,...,Sm),


called canonical variables, that are linear combinations of the original variables


(F(n,m) = X(n,p)A(p,m)andS(n,m) = Y(n,q)B(q,m)), and with the largest possible


correlation (corr(F 1 ,S 1 )≥...≥corr(Fm,Sm)). The results obtained with this sta-


tistical technique are: the transformation matrices (A(p,m), B(q,m)), the score matri-


ces (F(n,m),S(n,m)), and the canonical correlation values (Ri=corr(Fi,Si))and


their statistical significances. Observation of the successive columns of the matrices


A(p,m)andB(q,m)will enable us to establish the variables most correlated with every


canonical variable. It is, also, possible to obtain the graph of dispersion ofF 1 vs


S 1. However, this method cannot be used to predict values of the variables of the


Y-block, and requiresn>p+q. CCA was used to examine the linear relationship


between chemical composition and foam characteristics of wine and cava samples


(Pueyo et al. 1995).


13.3.4.2 Multivariate Regression


The aim of this technique is to predict values of the response or dependent variables


(Y 1 ,...,Yq), as a function of the predictive, or independent variables


(X 1 ,X 2 ,...,Xp), by applying a mathematical modelYj = f(X 1 ,X 2 ,...,Xp),


that will be estimated usingnobservations of the calibration set,{(xi, 1 ,xi, 2 ,...,xi,p,


yi, 1 ,...,yi,q)}i= 1 ,...,n. These observations may have been selected by a fixed or ran-


domised experimental design.


Multiple Linear Regression (MLR)


MLR accepts for the observed value of eachrandom dependent variable the fol-


lowing linear model:yi =β 0 +β 1 xi, 1 +β 2 xi, 2 +...+βpxi,p+εi,whereβi
are the unknown parameters, andεithe independent error variables with normal


distribution (εi∼N(0,σ)). If we assume that (xi, 1 ,xi. 2 ,...,xi,p) are fixed values of


the independent variables (X 1 ,X 2 , ...,Xp), then theyivalues will have a normal


distribution with a common standard deviation (yi∼N(β 0 +


∑p


j= 1

βjxi,j,σ)). Using


theordinary least squares(OLS) procedure, which minimizes the sum of squares of


errors (


∑n


i= 1

ε^2 i=


∑n


i= 1

(yi−β 0 −β 1 xi, 1 −β 2 xi, 2 −...−βpxi,p)^2 ), the estimated linear


model (regression equation)is ˆyi=b 0 +b 1 xi, 1 +b 2 xi, 2 +...+bpxi,p.Theregression


coefficients bi, estimators of the parametersβi, can be calculated according to


−→
b=

(XtX)−^1 Xt
−→
y,where


−→
b =





b 0
b 1
...
bp




⎠,

−→
y =





y 1
y 2
...
yn




⎠,X=






1 x 1 , 1 ...x 1 ,p
1 x 2 , 1 ...x 2 ,p
... ... ... ...
1 xn, 1 ...xn,p






,


providing that the matrixXis not singular. Among the estimators ofβiwhich are

Free download pdf