Model Selection 295
Learning theory constrains the dimensionality of models to make them
adapt to the sample size and structure. A central idea in machine learn-
ing theory is to add a penalty term to the objective function wherein the
penalty terms increases with the number of parameters but gets smaller if
the number of sample points increases.^6 That is, the penalty function is a
function of the size of the sample and of the complexity of the model. One
compares models by adding the penalty function to the likelihood func-
tion (a definition of the likelihood function is provided in Chapter 13). In
this way, one can obtain an ideal trade-off between model complexity and
forecasting ability.
At the other end of the landscape, the theoretical approach is based on
human creativity. In this approach, models are the result of new scientific
insights that have been embodied in theories. The theoretical approach is
typical of the physical sciences. Perhaps the most well-known example of a
theoretical model in financial economics is the capital asset pricing model
(CAPM).
The hybrid approach retains characteristics of both the theoretical and
machine learning approaches. It uses a theoretical foundation to identify
families of models but uses a learning approach to choose the correct model
within the family. For example, the ARCH/GARCH family of models (see
Chapter 11) is suggested by theoretical considerations while, in its prac-
tical application, the right model is selected through a learning approach
that identifies the model parameters. Thus, ultimately, in modern computer-
based financial econometrics, there is no clear-cut distinction between a
learning approach versus a theory-based a priori approach.
At this point, the four key conclusions regarding model complexity and
sample size are:
- Financial data are generally scarce for statistical estimation given the
complexity of their patterns. - Financial data are too scarce for sure statistical estimates.
- The scarcity of financial data means that the data might be compatible
with many different models. - There is a trade-off between model complexity and the size of the data
sample.
The last two conclusions are critical.
(^6) Several proposals have been made as regards the shape of the penalty func-
tion. Three criteria in general use are (1) the Akaike information criterion (AIC),
(2) Bayesian information criterion (BIC), and (3) maximum description length prin-
ciple. The first two are described in Appendix E.