Python for Finance: Analyze Big Financial Data

(Elle) #1
distributed or not.

Real-World Data


We are now pretty well equipped to attack real-world data and see how the normality


assumption does beyond the financial laboratory. We are going to analyze four historical


time series: two stock indices (the German DAX index and the American S&P 500 index)


and two stocks (Yahoo! Inc. and Microsoft Inc.). The data management tool of choice is


pandas (cf. Chapter 6), so we begin with a few imports:


In  [ 21 ]: import pandas as pd
import pandas.io.data as web

Here are the symbols for the time series we are interested in. The curious reader might of


course replace these with any other symbol of interest:


In  [ 22 ]: symbols =   [‘^GDAXI’,  ‘^GSPC’,    ‘YHOO’, ‘MSFT’]

The following reads only the Adj Close time series data into a single DataFrame object


for all symbols:


In  [ 23 ]: data    =   pd.DataFrame()
for sym in symbols:
data[sym] = web.DataReader(sym, data_source=‘yahoo’,
start=‘1/1/2006’)[‘Adj Close’]
data = data.dropna()
In [ 24 ]: data.info()
Out[24]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 2179 entries, 2006-01-03 00:00:00 to 2014-09-26 00:00:00
Data columns (total 4 columns):
^GDAXI 2179 non-null float64
^GSPC 2179 non-null float64
YHOO 2179 non-null float64
MSFT 2179 non-null float64
dtypes: float64(4)

The four time series start at rather different absolute values:


In  [ 25 ]: data.head()
Out[25]: ^GDAXI ^GSPC YHOO MSFT
Date
2006-01-03 5460.68 1268.80 40.91 22.09
2006-01-04 5523.62 1273.46 40.97 22.20
2006-01-05 5516.53 1273.48 41.53 22.22
2006-01-06 5536.32 1285.45 43.21 22.15
2006-01-09 5537.11 1290.15 43.42 22.11

Figure 11-7 shows therefore the four time series in direct comparison, but normalized to a


starting value of 100:


In  [ 26 ]: (data   /   data.ix[ 0 ]    *    100 ).plot(figsize=( 8 ,    6 ))
Free download pdf