Wine Chemistry and Biochemistry

(Steven Felgate) #1

708 P.J. Mart ́ın-Alvarez ́


both unbiased and linear functions of theyivalues, the OLS estimators have the


smallest variance.


The most important results obtained using MLR are:


 Theregression coefficients(b

i) and their standard errors, the confidence interval
forβi, and values of the t-statistic to test the null hypothesisH 0 ≡βi=0and
their associated probabilities

 TheANOVA table, with the decomposition of the total variation ofy

ivalues with
respect of their mean (SStot), in two parts (SStot=SSreg+SSres): the varia-
tion explained by the regression model (SSreg) and the non-explained variation

(SSres), and the F-value (Fcal=


SSreg/p
SSres/(n−p−1)

)withpandn−p−1df,


to test the global hypothesisH 0 ≡β 1 =...=βp=0vsH 1 ≡not all of the
coefficients are equal to zero(F-Test for overall significance)

 The calculated values ( ˆy

i=b 0 +b 1 xi, 1 +b 2 xi, 2 +...+bpxi,p), the residuals
(ei=yi−yˆi) and the graphical representation of the calculated ( ˆyi)vs. observed
(yi)values

 The statistics about thegoodness of fit:thecoefficient of determination R (^2) =
SSreg
SStot
= 1 −
SSres
SStot
,theadjusted coefficient of determination R^2 ad j = 1 −
SSres/(n−p−1)
SStot/(n−1)
,themultiple correlation coefficient(R=

R^2 ), which is the
correlation coefficient between observed and calculated values ofY,andthestan-
dard error of estimation,s=

SSres/(n−p−1)=

∑n
i= 1
(yi−yˆi)^2 /(n−p−1),
that is an unbiased estimator ofσ
To validate the estimated model, the hypotheses established on the errors (inde-
pendence, equality of variances and normal distribution) must be confirmed. When
there are genuine repeated runs in the data, thelack of fit F-testcan be used.
Although the OLS estimation provides the smallest variance, the presence of
multicolinearityamong theXvariables can give rise to unreliable predictions ofY,
and in this case, the parametersβican also be estimated using other procedures
such as:


 Principal Components Regression(PCR), which first carries out PCA on the

(X 1 ,X 2 ,...,Xp) variables, and then uses the scores of theq-first uncorrelated
PCs, as independent variables ( ˆyi=b∗ 0 +b∗ 1 PCi, 1 +b∗ 2 PCi, 2 +...+b∗qPCi,q),
or

 Partial Least Squares (PLS) regression, which is an alternative to PCR, using

otherqPCs components, that are also combinations of the variables but calcu-
lated assuming that it has a high correlation with the response variableY

The results with PCR and PLS regression include the number of PCs obtained


by leave-one-out cross-validation procedure, the values of regression coefficients


forXvariables, the value of R^2 ,andtheroot mean square error of calibration


(RMSEC)andtheroot mean square error of predictionby cross-validation proce-

Free download pdf