Model Selection 303
introducing biases formed by available but insufficient data sets. Even trying
too many possible solutions, one risks falling into the trap of data snooping.
Key Points
■ (^) Model selection in financial econometrics requires a blend of theory,
creativity, and machine learning.
■ (^) The machine learning approach starts with a set of empirical data that
we want to explain.
■ (^) There is a trade-off between model complexity and the size of the data
sample. To implement this trade-off, ensuring that models have forecast-
ing power, the fitting of sample data is constrained to avoid fitting noise.
■ (^) Financial data are generally scarce given the complexity of their pat-
terns. This scarcity introduces uncertainty as regards statistical esti-
mates. It means that the data might be compatible with many different
models with the same level of statistical confidence.
■ (^) A serious mistake in model selection is to look for models that fit rare
or unique patterns; such patterns are purely random and lack predictive
power.
■ (^) Another mistake in model selection is data snooping; that is, fitting
models to the same data that we want to explain. A sound model selec-
tion approach calls for a separation of sample data and test data: mod-
els are fitted to sample data and tested on test data.
■ (^) Because data are scarce, techniques have been devised to make optimal
use of data; perhaps the most widely used of such techniques is boot-
strapping.
■ (^) Financial data are also subject to survivorship bias; that is, data are
selected using criteria known only at the end of the period. Survivorship
bias induces biases in models and results in forecasting errors.
■ (^) Model risk is the risk that models are subject to forecasting errors in
real data.
■ (^) A simple idea that is widely used in practice to mitigate model risk is to
take the average of different models.
■ (^) A sound model selection methodology includes strong theoretical con-
siderations, the rigorous separation of sample and testing data, and dis-
cipline to avoid data snooping.