294 The Basics of financial economeTrics
The machine learning approach to financial modeling is in principle a
consequence of the diffusion of low-cost high-performance computers.^4 It is
based on using a family of very flexible models that can approximate sam-
ple data with unlimited precision.^5 Consider that some “machine learning”
appears in most financial econometric endeavors. For example, determining
the number of lags in an autoregressive model is a problem typically faced in
financial econometric modeling (see Chapter 9).
However, practice has shown that if we represent sample data with very
high precision, we typically obtain poor forecasting performance. Here is
why. In general, the main features of the data can be described by a sim-
ple structural model plus unpredictable noise. As the noise is unpredict-
able, the goal of a model is to capture the structural components. A very
precise model of sample data (in-sample) will also try to match the unpre-
dictable noise. This phenomenon, called overfitting (discussed later in this
chapter), leads to poor (out-of-sample) forecasting abilities. Obviously there
is no guarantee that data are truly described by a simple structural model
plus noise. Data might be entirely random or might be described by a truly
complex model.
To address the problem of overfitting, the machine learning theory sug-
gests criteria to constrain the complexity of the model so that it fits sample
data only partially but, as a trade-off, retains some forecasting power. The
intuitive meaning is the following: The structure of the data and the sample
size dictate the complexity of the laws that can be learned by computer
algorithms. This is typically accomplished by introducing what is called a
penalty function. For example, determining the number of lags in an autore-
gressive model is typically solved with methods of machine learning theory
by selecting the number of lags that minimize the sum of the loss function of
the model plus a penalty function.
This is a fundamental point. If we have only a small sample data set, we
can learn only simple patterns, provided that these patterns indeed exist.
(^4) In the 1970s, a full-fledged quantitative theory of machine learning was developed
in V. N. Vapnik and Y. A. Chervonenkis, Theory of Pattern Recognition (Moscow:
Nauka, 1974). While this theory goes well beyond the scope of this chapter, the prac-
tical implication of the theory is what is important here: model complexity must be
constrained in function of the sample.
(^5) Neural networks are a classical example. With an unrestricted number of layers and
nodes, a neural network can approximate any function with arbitrary precision. We
express this fact by saying that a neural network is a universal function approxima-
tor. The idea of universal function approximators is well known in calculus. The
Taylor series and Fourier series are universal approximators for broad classes of
functions.