Wine Chemistry and Biochemistry

690 P.J. Mart ́ın-Alvarez ́

To assess the statistical validity of the fitted regression equation, the ANOVA

principle of splitting the variation in theYs about their mean,SStot=

∑n

i= 1

(yi−y ̄)^2 ,

into an explainable component due to the fitted regression model,SSreg =

∑n

i= 1

(ˆyi− ̄y)^2 , and another unexplainable one due to the error,SSres=

∑n

i= 1

(ˆyi−yi)^2 ,

can be applied (SStot=SSreg+SSres). A measure of the precision of fit is the

coefficient of determination,R^2 =SSreg/SStot, that measures the proportion of

total variation about the mean ̄yexplained by the regression (0≤R^2 ≤1),which

is often expressed as a percentage (0≤R^2 (%)≤100 ), and that should not be too
far from 1. Another measure of the precision is thestandard deviation of residuals,

orstandard error of estimate,s=

√ SSres/(n−2), that is the estimate of standard

deviationσ, and should be as small as possible. These sums of squares, their corre-

sponding df, (n−1)=(1)+(n−2), and the mean squares,MSSregandMSSres,are

presented in theANOVA table for regression. The statisticFcal=MSSreg/MSSres

has an F-distribution with 1 andn−2 df, and can be used to test the null hypothesis

H 0 ≡β 1 =0vsH 1 ≡β 1 =0(F test of linear model). When the value ofαis fixed,

ifP<αthe null hypothesis is rejected and the fitted equation appears statistically

valid, and ifP>α,H 0 is accepted and a constant model would be accepted for

Y(Y=β 0 +ε). It is also possible to test the null hypothesesH 0 ≡β 1 =0and

H 0 ≡β 0 =0 using the statisticstcal=sb^1 b 1 andtcal=sb^0 b 0 that follow a t-distribution

withn−2df,andwheresb 1 andsb 0 are the standard errors ofb 1 andb 0 , respectively

(sb 1 =√∑s
(xi− ̄x)^2

,sb 0 =s

√ ∑ xi^2 n ∑ (xi−x ̄)^2 ) or calculate theconfidence intervals for the

parametersβ 1 ∈

( b 1 ±t 1 −α/ 2 ,n− 2 sb 1

) andβ 0 ∈

( b 0 ±t 1 −α/ 2 ,n− 2 sb 0

) . When there are genuine repeated runs in the data, we can use thelack of fit

F-test,Fcal=

SSlackf/(n− 2 −g)
SSpureer/g withn−^2 −gandgdf, to check whether or not the
model is correct (H 0 ≡the model has no lack-of-fit, or there are no reasons to doubt

the adequacy of the model,andH 1 ≡the model appears to be inadequate). The

statisticSSpureer=

∑

j

(nj−1)s^2 jis the sum of squares of pure error of repeated

points, withg=

∑ (nj−1) df, andSSlack f =SSres−SSpureeris the sum of

squares of lack of fit withn− 2 −gdf. IfFcal > F 1 −α,n− 2 −g,gthis indicates

that the model appears to be inadequate and we should use another model, such

as the quadratic polynomial model,Y=β 0 +β 1 X+β 2 X^2 +ε, that is a second-

order (inX) linear (in theβ′s) regression model, or diminishing the interval ofX

values.

Diagnostic checking of the residuals can be used to assess the validity of model

assumptions, and to check the practical validity of the predictions.

If the model is accepted as valid, it can be used topredicttheYvalue for a given
valuex 0 ofX(ˆy 0 =b 0 +b 1 x 0 ), and to calculate the confidence interval for the true

mean value ofYfor thex 0 value,μY|X=x 0 ∈

( yˆ 0 ±t 1 −α/ 2 ,n− 2 s

√ 1 n+ ∑(x^0 −x ̄)^2 (xi−x ̄)^2

) ,

that defines theconfidence bandsfor any valuex 0.

Wine Chemistry and Biochemistry

Get our desktop app

Company

Features

Documentation

Resources