Python for Finance: Analyze Big Financial Data

distributed or not.

Real-World Data

We are now pretty well equipped to attack real-world data and see how the normality

assumption does beyond the financial laboratory. We are going to analyze four historical

time series: two stock indices (the German DAX index and the American S&P 500 index)

and two stocks (Yahoo! Inc. and Microsoft Inc.). The data management tool of choice is

pandas (cf. Chapter 6), so we begin with a few imports:

In [ 21 ]: import pandas as pd import pandas.io.data as web

Here are the symbols for the time series we are interested in. The curious reader might of

course replace these with any other symbol of interest:

In [ 22 ]: symbols = [‘^GDAXI’, ‘^GSPC’, ‘YHOO’, ‘MSFT’]

The following reads only the Adj Close time series data into a single DataFrame object

for all symbols:

In [ 23 ]: data = pd.DataFrame() for sym in symbols: data[sym] = web.DataReader(sym, data_source=‘yahoo’, start=‘1/1/2006’)[‘Adj Close’] data = data.dropna() In [ 24 ]: data.info() Out[24]: <class ‘pandas.core.frame.DataFrame’> DatetimeIndex: 2179 entries, 2006-01-03 00:00:00 to 2014-09-26 00:00:00 Data columns (total 4 columns): ^GDAXI 2179 non-null float64 ^GSPC 2179 non-null float64 YHOO 2179 non-null float64 MSFT 2179 non-null float64 dtypes: float64(4)

The four time series start at rather different absolute values:

In [ 25 ]: data.head() Out[25]: ^GDAXI ^GSPC YHOO MSFT Date 2006-01-03 5460.68 1268.80 40.91 22.09 2006-01-04 5523.62 1273.46 40.97 22.20 2006-01-05 5516.53 1273.48 41.53 22.22 2006-01-06 5536.32 1285.45 43.21 22.15 2006-01-09 5537.11 1290.15 43.42 22.11

Figure 11-7 shows therefore the four time series in direct comparison, but normalized to a

starting value of 100:

In [ 26 ]: (data / data.ix[ 0 ] * 100 ).plot(figsize=( 8 , 6 ))