141
violation. The data and results may have to remain with the vendor and cannot be
moved to a local server.)
● Which data vendor should we choose and why?
The answers to all of these questions should be defined in a data requirements document.
We know of trading system projects that failed because certain data for testing, both historical and real
time for simulated execution, did not exist. The only way to test the models was to let the system run
loose and enter live trades. They lost over $2,000 in less than a minute, not to mention over 300 hours
of lost development time. Because they did not properly design the test plan, which could have had the
system enter opposite orders immediately and pay the spread as the cost of data, they will never know if
they killed a good system.
Sophisticated trading/investment systems may potentially incorporate several different
types of data. For clarity, we define five types of data.
14.2.1. Price Data
Price data consists of the real bid/ask, trade, and volume data for securities and deriva-
tives. For different instruments, different complications arise:
● Stock data. Stock price data is complicated by cross-listing and activity on ECNs.
For example, consider the end-of-day price for a listed stock. Quite simply, there is
no single such price anymore. Different trading venues may potentially show dif-
ferent last-trade prices. Also, most ECNs do not have closing prices per se; trading
may continue till 5 or 6 p.m., or even overnight. To get clean prices, teams must
benchmark methods for dealing with data from multiple exchanges and different
times, and for averaging bids and offers.
For tick data on stocks (actually all listed instruments), getting every tick may not be
feasible or even possible. Data vendors often use standard time intervals, for example, one
tick per second. For seconds, or even minutes, without ticks (although the bid and ask may
be moving) vendors sometimes make assumptions as to the price for that interval. Also,
vendors may not include ticks from all exchanges. Some claim to provide every tick, but
whether that is for all markets, or a single exchange, can be difficult to discern. Also, the
exchange may not publish every tick when volume exceeds an exchange ’ s limit, so histori-
cal tick data may be different than what you would be able to receive in real time. Whatever
the case, you may not be getting every tick, which means calculations like VWAP may in
real time be different than backtested values.
For high frequency trading systems, flat ticks (i.e., multiple trades going off at the same
price) reflect neither changes to the bid and ask prices nor the likelihood of a new limit
order getting filled, especially for contracts that use a FIFO order queue. When testing a
high frequency trading system, historical volume data and order book queue data, in addi-
tion to price data, is essential in order to estimate queue position and estimate fills.
● Option data. For stock option data, all the above issues again are relevant. However,
a more difficult issue comes up. Since options are quoted in implied volatility
14.2. STEP 1, LOOP 1: DATA NEEDS AND VENDORS