Wine Chemistry and Biochemistry

(Steven Felgate) #1

690 P.J. Mart ́ın-Alvarez ́


To assess the statistical validity of the fitted regression equation, the ANOVA


principle of splitting the variation in theYs about their mean,SStot=


∑n


i= 1

(yi−y ̄)^2 ,


into an explainable component due to the fitted regression model,SSreg =


∑n


i= 1

(ˆyi− ̄y)^2 , and another unexplainable one due to the error,SSres=


∑n


i= 1

(ˆyi−yi)^2 ,


can be applied (SStot=SSreg+SSres). A measure of the precision of fit is the


coefficient of determination,R^2 =SSreg/SStot, that measures the proportion of


total variation about the mean ̄yexplained by the regression (0≤R^2 ≤1),which


is often expressed as a percentage (0≤R^2 (%)≤100 ), and that should not be too
far from 1. Another measure of the precision is thestandard deviation of residuals,


orstandard error of estimate,s=



SSres/(n−2), that is the estimate of standard

deviationσ, and should be as small as possible. These sums of squares, their corre-


sponding df, (n−1)=(1)+(n−2), and the mean squares,MSSregandMSSres,are


presented in theANOVA table for regression. The statisticFcal=MSSreg/MSSres


has an F-distribution with 1 andn−2 df, and can be used to test the null hypothesis


H 0 ≡β 1 =0vsH 1 ≡β 1 =0(F test of linear model). When the value ofαis fixed,


ifP<αthe null hypothesis is rejected and the fitted equation appears statistically


valid, and ifP>α,H 0 is accepted and a constant model would be accepted for


Y(Y=β 0 +ε). It is also possible to test the null hypothesesH 0 ≡β 1 =0and


H 0 ≡β 0 =0 using the statisticstcal=sb^1
b 1
andtcal=sb^0
b 0
that follow a t-distribution

withn−2df,andwheresb 1 andsb 0 are the standard errors ofb 1 andb 0 , respectively


(sb 1 =√∑s
(xi− ̄x)^2


,sb 0 =s


√ ∑
xi^2
n

(xi−x ̄)^2 ) or calculate theconfidence intervals for the

parametersβ 1 ∈


(
b 1 ±t 1 −α/ 2 ,n− 2 sb 1

)
andβ 0 ∈

(
b 0 ±t 1 −α/ 2 ,n− 2 sb 0

)
.
When there are genuine repeated runs in the data, we can use thelack of fit

F-test,Fcal=


SSlackf/(n− 2 −g)
SSpureer/g withn−^2 −gandgdf, to check whether or not the
model is correct (H 0 ≡the model has no lack-of-fit, or there are no reasons to doubt


the adequacy of the model,andH 1 ≡the model appears to be inadequate). The


statisticSSpureer=



j

(nj−1)s^2 jis the sum of squares of pure error of repeated


points, withg=



(nj−1) df, andSSlack f =SSres−SSpureeris the sum of

squares of lack of fit withn− 2 −gdf. IfFcal > F 1 −α,n− 2 −g,gthis indicates


that the model appears to be inadequate and we should use another model, such


as the quadratic polynomial model,Y=β 0 +β 1 X+β 2 X^2 +ε, that is a second-


order (inX) linear (in theβ′s) regression model, or diminishing the interval ofX


values.


Diagnostic checking of the residuals can be used to assess the validity of model


assumptions, and to check the practical validity of the predictions.


If the model is accepted as valid, it can be used topredicttheYvalue for a given
valuex 0 ofX(ˆy 0 =b 0 +b 1 x 0 ), and to calculate the confidence interval for the true


mean value ofYfor thex 0 value,μY|X=x 0 ∈


(
yˆ 0 ±t 1 −α/ 2 ,n− 2 s


1
n+
∑(x^0 −x ̄)^2
(xi−x ̄)^2

)
,

that defines theconfidence bandsfor any valuex 0.

Free download pdf