159
The goal here is to lock down and control the universe of instruments and data used
for testing. The team then has to determine if the universe is large enough to allow test-
ing. If only 50 stocks make the universe, then your universe creation rules probably have
a problem. The next item to address is the appropriate time lag for the testing scenario.
A time lag is required to handle the point-in-time data issue. If you have perfect data,
then your lag will be zero. Otherwise you need to determine the appropriate lag.
16.1.1. Data Proportions
In classical statistics a subset of data is selected for building and optimizing algorithms.
This subset may be selected by using the data furthest back in time, a randomly chosen
data, or a manual process to ensure that a good distribution of economic cycles is repre-
sented. The normal rule is to use one-third to one-half of the data for the initial algorithm
testing and optimization.
The next one-half to two-thirds of the data can then be used to confirm the algorithm
with out-of-sample data. The purpose of using this data is to prove that the algorithm
works properly with data that was not used to make the algorithm. Since researchers nor-
mally like to reoptimize an algorithm if it does not work, we recommend that one-third
of the data be used for in-sample testing, one-third for out of sample, and one-third for a
final test set. The last third is only used after all optimizations and adjustments are done;
this is a double safety to ensure that the researcher does not overfit the data. If the trad-
ing/investment machine fails the final one-third then the project should be considered for
cancellation; researchers may have data mined the first two-thirds of the data so all the
research results should be viewed with suspicion.
For more specific situations, we recommend the following allocations:
● High frequency systems. For each of the three test sets, allocate one-third of the
data. Tune on the first third. In-sample test on the first two-thirds, and out of sample
on the last third. Assign a random number, 1, 2, or 3, to each day. Use the 1 day for
tuning, 1s and 2s for in sample, and 3s for out of sample.
● Equity portfolio systems. By random, select three portfolios of stocks. These port-
folios should mirror the sector weights and credit score weights of the benchmark. Use
portfolio 1 for tuning, portfolios 1 and 2 for in-sample testing, and portfolio 3 for out-
of-sample testing. An alternative is, given say 15 years of data, to tune on the first five
years, in-sample test on the first ten years, and out-of-sample test on the last five years.
We also recommend the use of a data set that contains fundamental equity factors pre-
and post-Sarbanes-Oxley. Specifically, let us consider a system that uses insider trading
as a predictor of earnings surprises. Such an indicator would clearly perform differently
before Sarbanes-Oxley than after it. As a result, hold back from the in-sample test the
years of data since the legislation was passed, as this data represents the real-world today.
What are we trying to accomplish? If we are trying to create a system that performs
well in market downturns, overweight the selection of data for in-sample with downturn
years. Then, test out of sample on a blend of years using both down and up. Likewise,
if we intend to create an algorithm for any type of market, weight the in-sample data
set uniformly, but subsequently perform the out-of-sample test on, say, only the down
years, and then retune it and run it again on, say, only the up years. This will give you an
understanding of how your algorithm performs in both up and down economic cycles.
Sometimes, all that is needed to become successful is to match the benchmark in average
and good years, but substantially outperform the benchmark in down cycles.
16.1. STEP 3, LOOP 1: DEFINE TESTING METHODOLOGY