Python for Finance: Analyze Big Financial Data

(Elle) #1
distributed or not.

Real-World Data

We are now pretty well equipped to attack real-world data and see how the normality

assumption does beyond the financial laboratory. We are going to analyze four historical

time series: two stock indices (the German DAX index and the American S&P 500 index)

and two stocks (Yahoo! Inc. and Microsoft Inc.). The data management tool of choice is

pandas (cf. Chapter 6), so we begin with a few imports:

In  [ 21 ]: import pandas as pd
import as web

Here are the symbols for the time series we are interested in. The curious reader might of

course replace these with any other symbol of interest:

In  [ 22 ]: symbols =   [‘^GDAXI’,  ‘^GSPC’,    ‘YHOO’, ‘MSFT’]

The following reads only the Adj Close time series data into a single DataFrame object

for all symbols:

In  [ 23 ]: data    =   pd.DataFrame()
for sym in symbols:
data[sym] = web.DataReader(sym, data_source=‘yahoo’,
start=‘1/1/2006’)[‘Adj Close’]
data = data.dropna()
In [ 24 ]:
Out[24]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 2179 entries, 2006-01-03 00:00:00 to 2014-09-26 00:00:00
Data columns (total 4 columns):
^GDAXI 2179 non-null float64
^GSPC 2179 non-null float64
YHOO 2179 non-null float64
MSFT 2179 non-null float64
dtypes: float64(4)

The four time series start at rather different absolute values:

In  [ 25 ]: data.head()
2006-01-03 5460.68 1268.80 40.91 22.09
2006-01-04 5523.62 1273.46 40.97 22.20
2006-01-05 5516.53 1273.48 41.53 22.22
2006-01-06 5536.32 1285.45 43.21 22.15
2006-01-09 5537.11 1290.15 43.42 22.11

Figure 11-7 shows therefore the four time series in direct comparison, but normalized to a

starting value of 100:

In  [ 26 ]: (data   /   data.ix[ 0 ]    *    100 ).plot(figsize=( 8 ,    6 ))
Free download pdf