Python for Finance: Analyze Big Financial Data

(Elle) #1

Technical Analysis


Technical analysis based on historical price information is a typical task finance


professionals and interested amateurs engage in. On Wikipedia you find the following


definition:


In finance, technical analysis is a security analysis methodology for forecasting the direction of prices through the

study of past market data, primarily price and volume.

In what follows, we focus on the study of past market data for backtesting purposes, and


not too much on using our insights to predict future price movements. Our object of study


is the benchmark index Standard & Poor’s 500 (S&P 500), which is generally considered


to be a good proxy for the whole stock market in the United States. This is due to the high


number of names included in the index and the total market capitalization represented by


it. It also has highly liquid futures and options markets.


We will read historical index level information from a web source and will implement a


simple backtesting for a trading system based on trend signals. But first we need the data


to get started. To this end, we mainly rely on the pandas library, which simplifies a


number of related technical issues. Since it is almost always used, we should also import


NumPy by default:


In  [ 33 ]: import numpy as np
import pandas as pd
import pandas.io.data as web

SCIENTIFIC AND FINANCIAL PYTHON STACK

In addition to NumPy and SciPy, there are only a couple of important libraries that form the fundamental scientific

and financial Python stack. Among them is pandas. Make sure to always have current (stable) versions of these

libraries installed (but be aware of potential syntax and/or API changes).

The sublibrary pandas.io.data contains the function DataReader, which helps with


getting financial time series data from different sources and in particular from the popular


Yahoo! Finance site. Let’s retrieve the data we are looking for, starting on January 1, 2000:


In  [ 34 ]: sp500   =   web.DataReader(‘^GSPC’, data_source=‘yahoo’,
start=‘1/1/2000’, end=‘4/14/2014’)
sp500.info()
Out[34]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 3592 entries, 2000-01-03 00:00:00 to 2014-04-14 00:00:00
Data columns (total 6 columns):
Open 3592 non-null float64
High 3592 non-null float64
Low 3592 non-null float64
Close 3592 non-null float64
Volume 3592 non-null int64
Adj Close 3592 non-null float64
dtypes: float64(5), int64(1)

DataReader has connected to the data source via an Internet connection and has given


back the time series data for the S&P 500 index, from the first trading day in 2000 until


the end date. It has also generated automatically a time index with Timestamp objects.


To get a first impression, we can plot the closing quotes over time. This gives an output


like that in Figure 3-5:


In  [ 35 ]: sp500[‘Close’].plot(grid=True,  figsize=( 8 ,    5 ))
Free download pdf