13 Statistical Techniques for the Interpretation of Analytical Data 707
13.3.4.1 Canonical Correlation Analysis (CCA)
In order to measure the association between the two sets of variables, CCA calcu-
latesmnew variables (m=min(p,q)) in each block (F 1 ,...,Fm,S 1 ,...,Sm),
called canonical variables, that are linear combinations of the original variables
(F(n,m) = X(n,p)A(p,m)andS(n,m) = Y(n,q)B(q,m)), and with the largest possible
correlation (corr(F 1 ,S 1 )≥...≥corr(Fm,Sm)). The results obtained with this sta-
tistical technique are: the transformation matrices (A(p,m), B(q,m)), the score matri-
ces (F(n,m),S(n,m)), and the canonical correlation values (Ri=corr(Fi,Si))and
their statistical significances. Observation of the successive columns of the matrices
A(p,m)andB(q,m)will enable us to establish the variables most correlated with every
canonical variable. It is, also, possible to obtain the graph of dispersion ofF 1 vs
S 1. However, this method cannot be used to predict values of the variables of the
Y-block, and requiresn>p+q. CCA was used to examine the linear relationship
between chemical composition and foam characteristics of wine and cava samples
(Pueyo et al. 1995).
13.3.4.2 Multivariate Regression
The aim of this technique is to predict values of the response or dependent variables
(Y 1 ,...,Yq), as a function of the predictive, or independent variables
(X 1 ,X 2 ,...,Xp), by applying a mathematical modelYj = f(X 1 ,X 2 ,...,Xp),
that will be estimated usingnobservations of the calibration set,{(xi, 1 ,xi, 2 ,...,xi,p,
yi, 1 ,...,yi,q)}i= 1 ,...,n. These observations may have been selected by a fixed or ran-
domised experimental design.
Multiple Linear Regression (MLR)
MLR accepts for the observed value of eachrandom dependent variable the fol-
lowing linear model:yi =β 0 +β 1 xi, 1 +β 2 xi, 2 +...+βpxi,p+εi,whereβi
are the unknown parameters, andεithe independent error variables with normal
distribution (εi∼N(0,σ)). If we assume that (xi, 1 ,xi. 2 ,...,xi,p) are fixed values of
the independent variables (X 1 ,X 2 , ...,Xp), then theyivalues will have a normal
distribution with a common standard deviation (yi∼N(β 0 +
∑p
j= 1
βjxi,j,σ)). Using
theordinary least squares(OLS) procedure, which minimizes the sum of squares of
errors (
∑n
i= 1
ε^2 i=
∑n
i= 1
(yi−β 0 −β 1 xi, 1 −β 2 xi, 2 −...−βpxi,p)^2 ), the estimated linear
model (regression equation)is ˆyi=b 0 +b 1 xi, 1 +b 2 xi, 2 +...+bpxi,p.Theregression
coefficients bi, estimators of the parametersβi, can be calculated according to
−→
b=
(XtX)−^1 Xt
−→
y,where
−→
b =
⎛
⎜
⎜
⎝
b 0
b 1
...
bp
⎞
⎟
⎟
⎠,
−→
y =
⎛
⎜
⎜
⎝
y 1
y 2
...
yn
⎞
⎟
⎟
⎠,X=
⎛
⎜
⎜
⎜
⎝
1 x 1 , 1 ...x 1 ,p
1 x 2 , 1 ...x 2 ,p
... ... ... ...
1 xn, 1 ...xn,p
⎞
⎟
⎟
⎟
⎠
,
providing that the matrixXis not singular. Among the estimators ofβiwhich are