140 CHAPTER ◆ 1 4 Gather Historical Data
14.1. Purchasing Data from Vendors
Quality data for backtesting results from a comparison between what is required and what
data vendors are able to supply, and they usually will promise whatever is asked for. The
product team must judge data quality independent of the assurances of the data provider.
To the extent possible, the product team:
● Establishes vendor selection criteria.
● Maintains records of data vendor evaluation results.
● Ensures that vendor-supplied data conforms to requirements to the extent possible.
● Establishes data vendor controls proportionate to the effect the data can have on
backtesting processes and the finished trading/investment system.
This correlates, in manufacturing, to properly selecting and controlling the raw inputs into
the process and performing machine run-off tests to ensure the purchased materials or
machines meet the agreed upon quality specifications. When purchasing data, the product
team should:
● Identify the need for data and the specifications of data available for purchase.
● Evaluate of the cost of data, taking into account price, delivery, and performance.
● Assess vendors ’ differing criteria for verifying data (i.e., how forthcoming are they
about assumptions in the data).
● Read and evaluate vendor ’ s documentation.
● Consider the experience of the vendor.
● Compare each vendor ’ s performance relative to competitors, including:
- Data quality and the vendors ability to satisfy requirements
- Delivery performance, support, and responsiveness to problems
- References of past customers
- Financial viability
- Compliance
14.2. STEP 1, LOOP 1: Survey Data Needs and Vendors
As with Step 1 in all four stages of K|V, this step involves planning. Where prototypes
completed in Stage 1 used few instruments and limited data, backtesting will require lots
of data. The questions are the following: what data does the team need and where can
they get it? Assessing the need for data and surveying potential providers takes planning.
Specific questions to be answered are
● What environment will be used to store the data? Excel, database server, or a propri-
etary or third-party system?
● Where will the results be stored?
● Does the firm need to own the data or can it simply buy only the results? (Which
begs another question: where will the calculations be run? With some vendors, the
team may not be able or allowed to move the raw data without risking a contract