Python for Finance: Analyze Big Financial Data

Technical Analysis

Technical analysis based on historical price information is a typical task finance

professionals and interested amateurs engage in. On Wikipedia you find the following

definition:

In finance, technical analysis is a security analysis methodology for forecasting the direction of prices through the

study of past market data, primarily price and volume.

In what follows, we focus on the study of past market data for backtesting purposes, and

not too much on using our insights to predict future price movements. Our object of study

is the benchmark index Standard & Poor’s 500 (S&P 500), which is generally considered

to be a good proxy for the whole stock market in the United States. This is due to the high

number of names included in the index and the total market capitalization represented by

it. It also has highly liquid futures and options markets.

We will read historical index level information from a web source and will implement a

simple backtesting for a trading system based on trend signals. But first we need the data

to get started. To this end, we mainly rely on the pandas library, which simplifies a

number of related technical issues. Since it is almost always used, we should also import

NumPy by default:

In [ 33 ]: import numpy as np import pandas as pd import pandas.io.data as web

SCIENTIFIC AND FINANCIAL PYTHON STACK

In addition to NumPy and SciPy, there are only a couple of important libraries that form the fundamental scientific

and financial Python stack. Among them is pandas. Make sure to always have current (stable) versions of these

libraries installed (but be aware of potential syntax and/or API changes).

The sublibrary pandas.io.data contains the function DataReader, which helps with

getting financial time series data from different sources and in particular from the popular

Yahoo! Finance site. Let’s retrieve the data we are looking for, starting on January 1, 2000:

In [ 34 ]: sp500 = web.DataReader(‘^GSPC’, data_source=‘yahoo’, start=‘1/1/2000’, end=‘4/14/2014’) sp500.info() Out[34]: <class ‘pandas.core.frame.DataFrame’> DatetimeIndex: 3592 entries, 2000-01-03 00:00:00 to 2014-04-14 00:00:00 Data columns (total 6 columns): Open 3592 non-null float64 High 3592 non-null float64 Low 3592 non-null float64 Close 3592 non-null float64 Volume 3592 non-null int64 Adj Close 3592 non-null float64 dtypes: float64(5), int64(1)

DataReader has connected to the data source via an Internet connection and has given

back the time series data for the S&P 500 index, from the first trading day in 2000 until

the end date. It has also generated automatically a time index with Timestamp objects.

To get a first impression, we can plot the closing quotes over time. This gives an output

like that in Figure 3-5:

In [ 35 ]: sp500[‘Close’].plot(grid=True, figsize=( 8 , 5 ))