Python for Finance: Analyze Big Financial Data

tupleize_cols

Boolean, default False

Leave a list of tuples on columns as is

To implement the regression analysis, we only need one column from each data set. We

therefore generate a new DataFrame object within which we combine the two columns of

interest, namely those for the major indexes. Since VSTOXX data is only available from

the beginning of January 1999, we only take data from that date on:

In [ 75 ]: import datetime as dt data = pd.DataFrame({‘EUROSTOXX’ : es[‘SX5E’][es.index > dt.datetime( 1999 , 1 , 1 )]}) data = data.join(pd.DataFrame({‘VSTOXX’ : vs[‘V2TX’][vs.index > dt.datetime( 1999 , 1 , 1 )]}))

We also fill missing values with the last available values from the time series. We call the

fillna method, providing ffill (for forward fill) as the method parameter. Another

option would be bfill (for backward fill), which would however lead to a “foresight”

issue:

In [ 76 ]: data = data.fillna(method=‘ffill’) data.info() Out[76]: <class ‘pandas.core.frame.DataFrame’> DatetimeIndex: 4034 entries, 1999-01-04 00:00:00 to 2014-09-26 00:00:00 Data columns (total 2 columns): EUROSTOXX 4034 non-null float64 VSTOXX 4034 non-null float64 dtypes: float64(2) In [ 77 ]: data.tail() Out[77]: EUROSTOXX VSTOXX 2014-09-22 3257.48 15.8303 2014-09-23 3205.93 17.7684 2014-09-24 3244.01 15.9504 2014-09-25 3202.31 17.5658 2014-09-26 3219.58 17.6012

Again, a graphical representation of the new data set might provide some insights. Indeed,

as Figure 6-7 shows, there seems to be a negative correlation between the two indexes:

In [ 78 ]: data.plot(subplots=True, grid=True, style=‘b’, figsize=( 8 , 6 ))

Figure 6-7. The EURO STOXX 50 index and the VSTOXX volatility index