13 Statistical Techniques for the Interpretation of Analytical Data 709
dure (RMSEP), which are defined by the equationsRMSEC=
√
∑n
i= 1
(yi−ˆyi)^2 /n
andRMSEP=
√
∑n
i= 1
(yi−ˆy(i))^2 /n,whereyiis the observed value ofY,ˆyi,is
the predicted value, and ˆy(i)is the predicted value when the regression model is
constructed without the samplei,andnis the number of samples.RMSEPis a
measure of the model’s ability to predict the values of the response variables in new
samples.
In many cases, it is possible to use only a subset of thepvariablesXwithout
a serious loss predictive ability with theforward stepwise regression(SWMLR)
procedure, which, in each step, selects the predictor that more increases the variation
explained and verifies if a previously selected predictor can be removed (values for
F-statistics to enter and to remove variables should be fixed); or with thebest subsets
regressionprocedure.
Applications
MLR was used to investigate the factors influencing growth and tyramine production
(Marcobal et al. 2006a); SWMLR, PCR and PLS regression have been used to pre-
dict the aging time of the wine samples from the volatile composition (P ́erez-Coello
et al. 1999), and from the nitrogenous fraction (Moreno-Arribas et al. 1998). MLR
was applied to detect the proportion of eachfruit present in mixtures of grape and
apple juices (Dizy et al. 1992). SWMLR has been used: to predict the CIELAB
variables using the colorimetric indices as possible predictor variables (Monagas
et al. 2006b), to predict the foam characteristics of sparkling wines (Moreno-Arribas
et al. 2000), and to find out the phenolic compound that provided the best predictive
model of the antioxidant capacity (Monagas et al. 2005). PLS was used to model
quantitative relationships between foam characteristics and chemical composition
of cava samples (Pueyo et al. 1995). Table 13.24 shows the results of the applica-
tion of PLS regression for the prediction of aging times in wines from five amino
acids in peptides<700-Da fraction (Moreno-Arribas et al. 1998), obtained with The
Unscrambler program version 7.8 (CAMO PROCESS AS, http://www.camo.no/)..)
Figure 13.7 shows the predicted values obtained using the fitted regression equation
Table 13.24Results of PLS regression for prediction of the aging times in wines from five amino
acids in peptides<700 Da fraction
Regresi ́on coefficients: Statistics:
Intercept Aspartic
acid +
asparagine
Threonineα-Alanineγ-Amino
butyric
acid
Ornithine NC R^2 RMSEC RMSEP
4.13513 1.14990 –.77765 –.89973 –.69257 1.07405 2 .968 1.67 1.94
NC=number of components selected by cross-validation, R^2 =determination coefficient,
RMSEC=Root Mean Square Error of Calibration, RMSEP=Root Mean Square Error of Predic-
tion