308 The Basics of financial economeTrics
from their knowledge, learning, experience, or simply what others have said.
A good example for a classic yet wrong way to model the excess return in
equity analysis is “market capitalization should be included in the model
because there is evidence of a size effect.”
From then on, the problem can only get worse as long as there is more
room for choices. A researcher may choose to design the same statistical
tests because of what others have done using similar data. The choices in
these tests include, but are not limited to, the selection of explanatory vari-
ables, how to measure them, the functional form of the model, the length of
the time period, the underlying probability distribution, and test statistics.
The difference in each of these artificial choices by itself may be small, but
its resulting investment performance impact is often significant.
Ideally, there should be no need to make artificial selections since all of
the tests should have been dictated by the underlying theories. However, even
the best financial concept, being abstract and simplified, does not always fully
specify its application in reality. There are many opportunities for which deci-
sion makers have to find proxies and instruments to complete the process.
A common fallacy, however, is that researchers tend to go back to the
most immediate prior step in searching for solutions when the result is not
what they expect to see. Of course, this attitude reflects the general human
tendency to overweight the information in the most recent period in their
decision making. This could easily lead to the mindless trial of numerous
alternatives, which are most likely not justified.
Therefore, a direct way to control for data snooping at all levels is that
the entire process will have to be reconstructed right from the beginning
whenever the output at any phase of the process cannot pass the quality
test. If the estimated model cannot explain the variation of excess returns to
some satisfactory degree, the process needs to be stopped and abandoned.
We need to go back and develop a new theory. If the predicted model does
not produce acceptable excess returns, go back to the beginning. Finally, if
the level of the actual risk-adjusted excess return found from following the
strategy “does not cut the muster”—go back to the beginning. This “trial-
and-error” process may correct for most, but not all, of the data snooping
problem. As we throw away the obvious, “bad” models through testing, we
learn from the experience of trial and error. This experience itself inevitably
affects the seemingly independent creation of the next generation of models.
Of course, most researchers would agree that there is almost no way to
completely eliminate some form of data snooping since even the most rigor-
ous scientific process is no more than a sequence of choices, subjective or
not. As suggested by Lo,^2 like someone suffering from substance addiction,
(^2) Lo, “Data-Snooping Biases in Financial Analysis.”