tupleize_cols
Boolean, default False
Leave a list of tuples on columns as is
To implement the regression analysis, we only need one column from each data set. We
therefore generate a new DataFrame object within which we combine the two columns of
interest, namely those for the major indexes. Since VSTOXX data is only available from
the beginning of January 1999, we only take data from that date on:
In [ 75 ]: import datetime as dt
data = pd.DataFrame({‘EUROSTOXX’ :
es[‘SX5E’][es.index > dt.datetime( 1999 , 1 , 1 )]})
data = data.join(pd.DataFrame({‘VSTOXX’ :
vs[‘V2TX’][vs.index > dt.datetime( 1999 , 1 , 1 )]}))
We also fill missing values with the last available values from the time series. We call the
fillna method, providing ffill (for forward fill) as the method parameter. Another
option would be bfill (for backward fill), which would however lead to a “foresight”
issue:
In [ 76 ]: data = data.fillna(method=‘ffill’)
data.info()
Out[76]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 4034 entries, 1999-01-04 00:00:00 to 2014-09-26 00:00:00
Data columns (total 2 columns):
EUROSTOXX 4034 non-null float64
VSTOXX 4034 non-null float64
dtypes: float64(2)
In [ 77 ]: data.tail()
Out[77]: EUROSTOXX VSTOXX
2014-09-22 3257.48 15.8303
2014-09-23 3205.93 17.7684
2014-09-24 3244.01 15.9504
2014-09-25 3202.31 17.5658
2014-09-26 3219.58 17.6012
Again, a graphical representation of the new data set might provide some insights. Indeed,
as Figure 6-7 shows, there seems to be a negative correlation between the two indexes:
In [ 78 ]: data.plot(subplots=True, grid=True, style=‘b’, figsize=( 8 , 6 ))
Figure 6-7. The EURO STOXX 50 index and the VSTOXX volatility index