Running tests on 12 million pairs is definitely not a viable option. Therefore,
the natural inclination is to try to reduce the number.
The most expedient way (not necessarily the best) to accomplish that is
by using heuristics, or rules of thumb. In the heuristics approach, the list of
pairs is explicitly partitioned into two sets: potentially cointegrated and not
potentially cointegrated. This partitioning is accomplished by applying a set
of rules. The rules are designed to exclude pairs with a slim chance of being
cointegrated. This limits the number of pairs in the candidate list and re-
duces the number of cointegration tests that need to be performed. Al-
though this approach seems reasonable, it is best characterized as being ad
hoc. Different people have different belief systems about the market. There-
fore, reasonable people can come up with dramatically different rule sets
based on their personal experiences. This makes the rule sets anecdotal in
nature. Furthermore, it is possible for individuals to hold opposing views.
An aggregation of rules representing the beliefs of multiple individuals may
end up being inconsistent. This is now a case where the whole is lesser than
the sum of its parts and results in missed opportunities. It might therefore
be useful to put some thought into this to come up with a more definitive
methodology.
The methodology we prescribe here is distinctly different from the rules
of thumb or heuristics approach. Instead of attempting to evaluate explicit
partitions, this approach aims to arrive at a relative ordering of the pairs
based on the degree of comovement. Each pair is associated with a score/
distance measure. The higher the score, the greater the degree of comove-
ment, and vice versa. Notably, such a structure lends itself to deductive rea-
soning. If we find that a pair is unsuitable for pairs trading, then we have
good reason to believe that every pair with a score/distance measure worse
than the current pair is also unsuitable. The pair selection process now be-
comes equivalent to choosing a suitable threshold value for the distance
measure. Notice that this approach relies solely on the distance measure for
ordering the pairs. Therefore, a proper choice of the distance measure is key
to the pairs selection process.
Let us quickly examine the properties that would be desirable of the
score/distance measure. First, if the evaluation of the score/distance measure
took as much effort as cointegration testing, then it would defeat the pur-
pose. We may as well test for cointegration directly with all the pairs in the
exhaustive list. Therefore, at the very least, the evaluation of the distance
measure must be relatively easy and straightforward. Additionally, it is de-
sirable that the distance measure not be completely empirical. Empirical de-
ductions rely solely on historical data. This comes with an underlying
assumption that the fundamentals of the firm are essentially static. There are
no changes that impact the valuation of the firm. Needless to say, this need
not be true. Ideally, we would prefer to tie the evaluation of the distance
86 STATISTICAL ARBITRAGE PAIRS