166 CHAPTER ◆ 1 7 Check Performance and Shadow Trade
projects set up automated systems to rerun all regression tests at specified intervals and
report any nonconformance. Common strategies are to rerun the system after every suc-
cessful compile, every night, or once a week. (Regression testing is an integral part of the
extreme programming software development methodology. In this methodology, design
documents are replaced by extensive, repeatable, and automated testing of the entire soft-
ware package at every stage in the software development cycle.)
17.1. STEP 4, LOOP 1: Perform Regression Test to
Validate Algorithms and Benchmark
In the first loop, the product team takes performance metrics from Stage 1, generated
from a small set of sample data, and runs comparisons against performance for the invest-
able universe. The team should also be laying out the tools of how they are going to
measure performance versus the benchmark. They will look at raw returns in the IS/OS
tests versus the results from Stage 1. They will look at excess returns of the Stage 1 sam-
ple against the universe and the benchmark to confirm the algorithm still outperforms
its peers. The difference between the benchmark and the universe of investable instru-
ments needs to be calculated to determine the effect of illiquid securities on the universe
returns. In addition, the team should compare sample calculations of returns instrument-
by- instrument against the returns of the sample universe. The team should perform all the
comparisons to double check that the sample performance of Stage 1 and the perform-
ance of the investable universe are highly correlated. This is done to ensure that the data
cleaning algorithm has not introduced errors into the calculations.
To clarify, let us say a small sample of data, say for 20 stocks, proved outputs of the
system in Stage 1. Now, after cleaning all the data and calculating all the returns, on a
stock-by-stock basis for the 20 stocks, the product should perform a white box regression
test. The results should show similar performance if the cleaning algorithms are correct.
Before scaling up, the product team should be sure to check all the outputs after the clean-
ing algorithm against the outputs from the original spiral for the original instruments.
Finally, in this step, the team should benchmark the difference between the investible
universe and the benchmark itself. This can be accomplished using statistics and charts to
graphically explain the difference. Also, we recommend checking the returns of the origi-
nal benchmark against the universe of stocks you selected.
There are many benchmarks that contain untradable securities due to market capitali-
zation, illiquidity, or poor credit rating. (e.g., the Crédit Suisse First Boston Convertible
Bond Index has in the past contained bonds that are illiquid.) When you compare your-
self against a benchmark that contains untradable securities, you may decide that your
algorithm is performing poorly; however, your algorithm may be performing well against
your competitors, since they have the same restrictions you have. Without this step of
benchmarking the investable universe versus the benchmark, the firm may otherwise kill
the project instead of fast tracking it as a peer-beating system.
Imagine this: if you could predict 100% of the future, what would be the optimal out-
put of your model? If you knew what the top 10% returns would be (i.e., if you knew
what the optimum basket would be), and you could pick everything right, what would
your returns be? The outputs of this hypothetical optimum algorithm should also be used
as a benchmark. This benchmark will provide a gauge of how much better you could
make an algorithm. If you know what optimum is, you know how much better you can
make the algorithm.