690 P.J. Mart ́ın-Alvarez ́
To assess the statistical validity of the fitted regression equation, the ANOVA
principle of splitting the variation in theYs about their mean,SStot=
∑n
i= 1(yi−y ̄)^2 ,
into an explainable component due to the fitted regression model,SSreg =
∑n
i= 1(ˆyi− ̄y)^2 , and another unexplainable one due to the error,SSres=
∑n
i= 1(ˆyi−yi)^2 ,
can be applied (SStot=SSreg+SSres). A measure of the precision of fit is the
coefficient of determination,R^2 =SSreg/SStot, that measures the proportion of
total variation about the mean ̄yexplained by the regression (0≤R^2 ≤1),which
is often expressed as a percentage (0≤R^2 (%)≤100 ), and that should not be too
far from 1. Another measure of the precision is thestandard deviation of residuals,
orstandard error of estimate,s=
√
SSres/(n−2), that is the estimate of standarddeviationσ, and should be as small as possible. These sums of squares, their corre-
sponding df, (n−1)=(1)+(n−2), and the mean squares,MSSregandMSSres,are
presented in theANOVA table for regression. The statisticFcal=MSSreg/MSSres
has an F-distribution with 1 andn−2 df, and can be used to test the null hypothesis
H 0 ≡β 1 =0vsH 1 ≡β 1 =0(F test of linear model). When the value ofαis fixed,
ifP<αthe null hypothesis is rejected and the fitted equation appears statistically
valid, and ifP>α,H 0 is accepted and a constant model would be accepted for
Y(Y=β 0 +ε). It is also possible to test the null hypothesesH 0 ≡β 1 =0and
H 0 ≡β 0 =0 using the statisticstcal=sb^1
b 1
andtcal=sb^0
b 0
that follow a t-distributionwithn−2df,andwheresb 1 andsb 0 are the standard errors ofb 1 andb 0 , respectively
(sb 1 =√∑s
(xi− ̄x)^2
,sb 0 =s
√ ∑
xi^2
n
∑
(xi−x ̄)^2 ) or calculate theconfidence intervals for theparametersβ 1 ∈
(
b 1 ±t 1 −α/ 2 ,n− 2 sb 1)
andβ 0 ∈(
b 0 ±t 1 −α/ 2 ,n− 2 sb 0)
.
When there are genuine repeated runs in the data, we can use thelack of fitF-test,Fcal=
SSlackf/(n− 2 −g)
SSpureer/g withn−^2 −gandgdf, to check whether or not the
model is correct (H 0 ≡the model has no lack-of-fit, or there are no reasons to doubt
the adequacy of the model,andH 1 ≡the model appears to be inadequate). The
statisticSSpureer=
∑
j(nj−1)s^2 jis the sum of squares of pure error of repeated
points, withg=
∑
(nj−1) df, andSSlack f =SSres−SSpureeris the sum ofsquares of lack of fit withn− 2 −gdf. IfFcal > F 1 −α,n− 2 −g,gthis indicates
that the model appears to be inadequate and we should use another model, such
as the quadratic polynomial model,Y=β 0 +β 1 X+β 2 X^2 +ε, that is a second-
order (inX) linear (in theβ′s) regression model, or diminishing the interval ofX
values.
Diagnostic checking of the residuals can be used to assess the validity of model
assumptions, and to check the practical validity of the predictions.
If the model is accepted as valid, it can be used topredicttheYvalue for a given
valuex 0 ofX(ˆy 0 =b 0 +b 1 x 0 ), and to calculate the confidence interval for the true
mean value ofYfor thex 0 value,μY|X=x 0 ∈
(
yˆ 0 ±t 1 −α/ 2 ,n− 2 s√
1
n+
∑(x^0 −x ̄)^2
(xi−x ̄)^2)
,that defines theconfidence bandsfor any valuex 0.
