160 CHAPTER ◆ 1 6 Perform In-Sample/Out-of-Sample Tests
● Fixed income portfolio systems. For fixed income data, indicators are created from
macro indicators such as economic data and Fed actions along with underlying cor-
porate data for bonds, CDOs, convertibles, and other structured products. Fixed
income data is normally very clean for month ends due to the requirements of mark-
ing a bond portfolio using dealer prices. Since most corporate bond investors hold
their positions for several months, we recommend performing a backtest that only
uses month end prices and calculations.
We also recommend producing a return for each security for each period using the same
rules. This will then be used in the backtesting step for the returns. We realize this is not
an industry standard practice. However, our method produces a stable return stream that
does not vary with the backtest. Our method forces, very early in the testing cycle, the
discussion of how to consistently measure returns.
With respect to database structure, we recommend separating the raw data from the
returns and from the indicators. This will enable reusability of returns and signals. Of
course, the ability to reuse returns assumes a fixed holding period. For a nonfixed holding
period, new returns would have to be generated.
16.1.2. Return windows
During a backtest, returns over different holding periods should be consistent. For example,
if you are optimizing an algorithm over three-month holding periods, returns for one-
month, two-month and six-month periods should be consistent with three-month periods.
You should not accept results where, say, one-, two- and six-month returns were negative,
while three-month returns were positive. We recommend backtesting over different hold-
ing horizons. If an algorithm does not work or is inconsistent for multiple horizons, you
should reject it. This should also be true for high frequency trading systems.
We also recommend applying this cross-holding period testing strategy to sectors,
countries, market capitalizations, etc. We have found that the best quality control tool to
identify these effects is the fishbone diagram and that the analysis and the structuring of
these tests are done very efficiently using design of experiments.
We recommend that the product team decide up front what results will be gathered from
the backtest: maximum drawdowns, number of winners versus losers, portfolio turnover
ratio, average monthly return, standard deviation of returns, average return per trade, Sharpe
ratio, Sortino ratio, and how to produce a time series of returns. Again, we suggest that this
be done using a fishbone diagram that will link common causes of variations in these calcu-
lations such as economic cycle and investment cycle. At the end of this loop, the team will
be able to perform cause and effect analysis for the variations in the returns.
16.2. STEP 3, LOOP 2: Perform In-Sample Test for
Large Sample
In-sample testing is very time intensive, because the team manually checks the calcula-
tions and results. During the in-sample test, algorithms may calculate the averages and
standard deviations for trades on, say, three instruments. Is the math right? Is the trading
logic right? If it does not work properly, the team will still be able to make modifications
to the data or to change the logic steps.
In-sample testing is a time-consuming and very complex task (which deserves its own
book). This is the step where the team converts all the prototype examples into prototype