46 The Basics of financial economeTrics
the regression and its evaluation. This building process consists of three
steps:
- Specification
- Fitting/estimating
- Diagnosis
In the specification step, we need to determine the dependent and inde-
pendent variables. We have to make sure that we do not include independent
variables that seem to have nothing to do with the dependent variable. More
than likely, in dealing with a dependent variable that is a financial variable,
financial and economic theory will provide a guide to what the relevant
independent variables might be. Then, after the variables have been identi-
fied, we have to gather data for all the variables. Thus, we obtain the vector
y and the matrix X. Without defending it theoretically here, it is true that the
larger the sample, the better the quality of the estimation. Theoretically, the
sample size n should at least be one larger than the number of independent
variables k. A rule of thumb is, at least, four times k.
The fitting or estimation step consists of constructing the functional linear
relationship expressed by the model. That is, we need to compute the correla-
tion coefficients for the regression coefficients. We perform this even for the
independent variables to test for possible interaction between them as explained
in the next chapter. The estimation, then, yields so-called point estimates of the
dependent variable for given values of the independent variables.^4
Once we have obtained the estimates for equation (3.10), we can move
on to evaluating the quality of the regression with respect to the given data.
This is the diagnosis step.
Diagnostic Check and Model Significance
As just explained, diagnosing the quality of some model is essential in the
building process. Thus we need to set forth criteria for determining model
quality. If, according to some criteria, the fit is determined to be insufficient, we
might have to redesign the model by including different independent variables.
We know from the previous chapter the goodness-of-fit measure is the
coefficient of determination (denoted by R^2 ). We will use that measure here
as well. As with the univariate regression, the coefficient of determination
measures the percentage of variation in the dependent variable explained by
all of the independent variables employed in the regression. The R^2 of the
(^4) This is in contrast to a range or interval of values as given by a confidence interval.
Appendix C explains what a confidence interval is.