now faced with a situation where there could be uncertainty or error in
both variables. This is vastly different from the standard linear regression
case, which assumes errors only in one variable.
Also mentioned earlier, each stock moves constantly, reaching its own
highs and lows within a time period, with the high and low price character-
izing the range of price movement. Although we allow ourselves to choose
only one representative price for a given time period for each stock within
this range, we also admit that there is a certain amount of uncertainty asso-
ciated with our choice. But data points in each time period are chosen from
the price ranges of different magnitudes. Under such circumstances, it would
be rather facetious to assume a constant probability distribution for the un-
certainty or errors that are intricately linked to the magnitude of the price
range in a given period. We can strongly assert that our situation is one
where the uncertainty associated with each data point is different. This is
also different from the standard regression scenario of constant variance in
the observations.
Therefore, to sum up, we have a situation where there is error associated
with both observations, and the variance of the observation error is also a
varying quantity. Although the differences of our situation with the standard
regression scenario are substantial, they are by no means novel. Such situa-
tions have been encountered in a variety of other applications and may be
applied to ours without change. Going that route, however, adds to the com-
plexity of the process. We will briefly discuss the solution approach in this
case to highlight the issues and proceed to suggest a much simpler approach.
The situation of nonconstant error distributions coupled with errors in
both variables can be handled by minimizing the chi-squared merit function,
given as
(7.6)
In Equation 7.6, are the variances of the error in the
observations. The errors are assumed to have a zero
mean and may be calibrated based on the range of movement of the stock
within each time period. Note that for our purposes it is not important to
have an absolute measure of the variance of the errors, just that the values
be proportional to the actual variances.
To further understand how the chi-squared function handles the situa-
tion of nonconstant error distributions, let us examine it in a little bit of de-
tail. The value in the numerator of the merit function is the squared error in
the regression. If the variance as shown in the denominator was a constant,
then the minimization boils down to minimizing the sum of squared errors,
log()pptA and log()tB
var()εεtA and var()tB
χγμ
γμ
εγ ε
2
2
1 2
,
[log log ]
var var
()=
()− ()−
= ()+ ()
∑
pptA tB
t
A
t
t B
N
110 STATISTICAL ARBITRAGE PAIRS