Wine Chemistry and Biochemistry

13 Statistical Techniques for the Interpretation of Analytical Data 707

13.3.4.1 Canonical Correlation Analysis (CCA)

In order to measure the association between the two sets of variables, CCA calcu-

latesmnew variables (m=min(p,q)) in each block (F 1 ,...,Fm,S 1 ,...,Sm),

called canonical variables, that are linear combinations of the original variables

(F(n,m) = X(n,p)A(p,m)andS(n,m) = Y(n,q)B(q,m)), and with the largest possible

correlation (corr(F 1 ,S 1 )≥...≥corr(Fm,Sm)). The results obtained with this sta-

tistical technique are: the transformation matrices (A(p,m), B(q,m)), the score matri-

ces (F(n,m),S(n,m)), and the canonical correlation values (Ri=corr(Fi,Si))and

their statistical significances. Observation of the successive columns of the matrices

A(p,m)andB(q,m)will enable us to establish the variables most correlated with every

canonical variable. It is, also, possible to obtain the graph of dispersion ofF 1 vs

S 1. However, this method cannot be used to predict values of the variables of the

Y-block, and requiresn>p+q. CCA was used to examine the linear relationship

between chemical composition and foam characteristics of wine and cava samples

(Pueyo et al. 1995).

13.3.4.2 Multivariate Regression

The aim of this technique is to predict values of the response or dependent variables

(Y 1 ,...,Yq), as a function of the predictive, or independent variables

(X 1 ,X 2 ,...,Xp), by applying a mathematical modelYj = f(X 1 ,X 2 ,...,Xp),

that will be estimated usingnobservations of the calibration set,{(xi, 1 ,xi, 2 ,...,xi,p,

yi, 1 ,...,yi,q)}i= 1 ,...,n. These observations may have been selected by a fixed or ran-

domised experimental design.

Multiple Linear Regression (MLR)

MLR accepts for the observed value of eachrandom dependent variable the fol-

lowing linear model:yi =β 0 +β 1 xi, 1 +β 2 xi, 2 +...+βpxi,p+εi,whereβi
are the unknown parameters, andεithe independent error variables with normal

distribution (εi∼N(0,σ)). If we assume that (xi, 1 ,xi. 2 ,...,xi,p) are fixed values of

the independent variables (X 1 ,X 2 , ...,Xp), then theyivalues will have a normal

distribution with a common standard deviation (yi∼N(β 0 +

∑p

j= 1

βjxi,j,σ)). Using

theordinary least squares(OLS) procedure, which minimizes the sum of squares of

errors (

∑n

i= 1

ε^2 i=

∑n

i= 1

(yi−β 0 −β 1 xi, 1 −β 2 xi, 2 −...−βpxi,p)^2 ), the estimated linear

model (regression equation)is ˆyi=b 0 +b 1 xi, 1 +b 2 xi, 2 +...+bpxi,p.Theregression

coefficients bi, estimators of the parametersβi, can be calculated according to

−→ b=

(XtX)−^1 Xt
−→
y,where

−→ b =

⎛

⎜ ⎜ ⎝

b 0 b 1 ... bp

⎞

⎟ ⎟ ⎠,

−→ y =

⎛

⎜ ⎜ ⎝

y 1 y 2 ... yn

⎞

⎟ ⎟ ⎠,X=

⎛

⎜ ⎜ ⎜ ⎝

1 x 1 , 1 ...x 1 ,p 1 x 2 , 1 ...x 2 ,p ... ... ... ... 1 xn, 1 ...xn,p

⎞

⎟ ⎟ ⎟ ⎠

,

providing that the matrixXis not singular. Among the estimators ofβiwhich are

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources