Anon

(Dana P.) #1

48 The Basics of financial economeTrics


SSE was defined in the previous chapter but in the multiple linear
regression case yˆ is given by equation (3.7) and the error terms by
equation (3.11).
The degrees of freedom of the SSR equal the number of independent
variables, dn = k, while the degrees of freedom of the SSE are dd = n − k − 1.^5
The MSR and MSE are the mean squares of regression and mean squared
errors, respectively, obtained by dividing the sum of squared deviations by
their respective degrees of freedom. All results necessary for the ANOVA are
shown in Table 3.1.
If the statistic is found significant at some predetermined level
(i.e., pF < α), the model does explain some variation of the dependent
variable y.^6
We ought to be careful not to overdo it; that is, we should not create a
model more complicated than necessary. A good guideline is to use the sim-
plest model suitable. Complicated and refined models tend to be inflexible
and fail to work with different samples. In most cases, they are poor models
for forecasting purposes. So, the best R^2 is not necessarily an indicator of
the most useful model. The reason is that one can artificially increase R^2
by including additional independent variables into the regression. But the
resulting seemingly better fit may be misleading. One will not know the true
quality of the model if one evaluates it by applying it to the same data used
for the fit. However, often if one uses the fitted model for a different set of
data, the weakness of the overfitted model becomes obvious.


TAbLE 3.1 ANOVA Component Pattern


df SS MS F p-Value of F

Regression k SSR


k


MSR

SSR

MSR

Residual n − k − 1 SSE MSE


nk−−


MSE

SSE

1

Total n – 1 SST


(^5) In total, the SST is chi-square distributed with n − 1 degrees of freedom. See Appen-
dix B for an explanation of the chi-square test.
(^6) Alternatively, one can check whether the test statistic is greater than the critical
value, that is, F > Fα.

Free download pdf