298 The Basics of financial economeTrics
A simple model cannot handle processes that start or end in the middle
of the training period. On the other hand, models that take into account
the founding or closing of firms cannot be simple. Consider, for example, a
simple linear autoregressive model (see Chapter 9). Any addition or deletion
of companies introduces a nonlinearity effect into the model and precludes
using standard tools such as the ordinary least squares method.
There is no ideal solution. Care is required in estimating possible perfor-
mance biases consequent to sample biases. Suppose that we make a forecast
of return processes based on models trained on the past three or four years
of returns data on the same processes that we want to forecast. Clearly
there is no data snooping because we use only information available prior
to forecasting. However, it should be understood that we are estimating our
models on data that contain biases. If the selection of companies to forecast
is subject to strong criteria, for example companies that are included in a
major stock index such as the S&P 500, it is likely that the model will suffer
a loss of performance. This is due to the fact that models will be trained on
spurious past performance. If the modeler is constrained to work on a spe-
cific stock selection, for example, in order to create an active strategy against
a selected benchmark, the modeler might want to reduce the biases applying
his or her own judgment.
The survivorship bias is not the only possible bias of sample data. More
in general, any selection of data contains some bias. Some of these biases
are intentional. For example, selecting large market capitalization compa-
nies or small market capitalization companies introduces special behavioral
biases that are intentional. However, other selection biases are more difficult
to appreciate. In general, any selection based on stock indexes introduces
index-specific biases in addition to the survivorship bias. Consider that pres-
ently thousands of indexes are in use. Institutional investors and their con-
sultants use these indexes to create asset allocation strategies and then give
the indexes to asset managers for active management.
Anyone using financial econometrics to create active management strate-
gies based on these stock indexes should be aware of the biases inherent in
the indexes when building their strategies. Data snooping applied to carefully
crafted stock selection can result in poor performance because the asset selec-
tion process inherent in the index formation process can produce very good
results in sample; these results vanish out-of-sample as “snow under the sun.”
Moving Training Windows
Thus far we have assumed that the data generating process (DGP) discussed
in Chapter 1 exists as a time-invariant model. Can we also assume that
the DGP varies and that it can be estimated on a moving window? If yes,