distributed or not.
Real-World Data
We are now pretty well equipped to attack real-world data and see how the normality
assumption does beyond the financial laboratory. We are going to analyze four historical
time series: two stock indices (the German DAX index and the American S&P 500 index)
and two stocks (Yahoo! Inc. and Microsoft Inc.). The data management tool of choice is
pandas (cf. Chapter 6), so we begin with a few imports:
In [ 21 ]: import pandas as pd
import pandas.io.data as web
Here are the symbols for the time series we are interested in. The curious reader might of
course replace these with any other symbol of interest:
In [ 22 ]: symbols = [‘^GDAXI’, ‘^GSPC’, ‘YHOO’, ‘MSFT’]
The following reads only the Adj Close time series data into a single DataFrame object
for all symbols:
In [ 23 ]: data = pd.DataFrame()
for sym in symbols:
data[sym] = web.DataReader(sym, data_source=‘yahoo’,
start=‘1/1/2006’)[‘Adj Close’]
data = data.dropna()
In [ 24 ]: data.info()
Out[24]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 2179 entries, 2006-01-03 00:00:00 to 2014-09-26 00:00:00
Data columns (total 4 columns):
^GDAXI 2179 non-null float64
^GSPC 2179 non-null float64
YHOO 2179 non-null float64
MSFT 2179 non-null float64
dtypes: float64(4)
The four time series start at rather different absolute values:
In [ 25 ]: data.head()
Out[25]: ^GDAXI ^GSPC YHOO MSFT
Date
2006-01-03 5460.68 1268.80 40.91 22.09
2006-01-04 5523.62 1273.46 40.97 22.20
2006-01-05 5516.53 1273.48 41.53 22.22
2006-01-06 5536.32 1285.45 43.21 22.15
2006-01-09 5537.11 1290.15 43.42 22.11
Figure 11-7 shows therefore the four time series in direct comparison, but normalized to a
starting value of 100:
In [ 26 ]: (data / data.ix[ 0 ] * 100 ).plot(figsize=( 8 , 6 ))