The statistic RMSE is an estimate of the variation of the response variable about the
population regression line. It is sometimes referred to as the residual standard deviation
of Y about the regression line. The sum of squared residuals, SSres (which is given in SAS
output for PROC REG if the residual option is specified), if divided by n−2 and then
square rooted, is equivalent to RMSE. The residual standard deviation of residuals is
calculated as:
Standard
deviation of
residuals—
8.7
The sampling distribution of RMSE (equivalent to the sampling distribution of the
standard deviation of the residuals) is normal and we would therefore expect that most of
the observed values of y would be within +/−1.96×RMSE=(1.96× 5.0328)=9.864 of the
least squares predicted values of Y. This provides another indication of the extent of
model fit. Once the overall significance of the least squares regression line has been
established, we can then examine in more detail the parameter estimates.
Statistical test for a regression slope
After a suitable regression model has been fitted to the data and a least squares prediction
equation determined, a significance test for the regression slope can be performed. When
there is only one explanatory variable in the model there will be only one slope parameter
to estimate and hence only one significance test for the slope. In a multiple regression,
however, there will be a parameter estimate and a significance test for each explanatory
variable in the regression model.
The test of the significance of the slope provides information on the utility of the
regression model, that is whether the linear regression model explains variation in the
response variable. The null hypothesis tested is as shown in an earlier section, H 0 : β 1 =0.
The alternative hypothesis is that the variables X and Y are linearly related, H 1 : β 1 ≠0. This
means that the variable X makes a significant contribution to the prediction of the
variable Y. The null hypothesis is tested by computing the ratio of the parameter estimate
to its standard error (b 1 /standard error of b 1 ) and comparing this with the sampling
distribution of the t-statistic with n−2 degrees of freedom.
The standard error of b 1 , denoted as is:
Standard
error of
slope—8.8
In this example the significance of the slope is evaluated as:
t=3.702/0.6476=5.716, with n−2 df.^
Inferences involving continuous data 265