Python for Finance: Analyze Big Financial Data

(Elle) #1
tupleize_cols

Boolean, default False

Leave a list of tuples on columns as is

To implement the regression analysis, we only need one column from each data set. We


therefore generate a new DataFrame object within which we combine the two columns of


interest, namely those for the major indexes. Since VSTOXX data is only available from


the beginning of January 1999, we only take data from that date on:


In  [ 75 ]: import datetime as dt
data = pd.DataFrame({‘EUROSTOXX’ :
es[‘SX5E’][es.index > dt.datetime( 1999 , 1 , 1 )]})
data = data.join(pd.DataFrame({‘VSTOXX’ :
vs[‘V2TX’][vs.index > dt.datetime( 1999 , 1 , 1 )]}))

We also fill missing values with the last available values from the time series. We call the


fillna method, providing ffill (for forward fill) as the method parameter. Another


option would be bfill (for backward fill), which would however lead to a “foresight”


issue:


In  [ 76 ]: data    =   data.fillna(method=‘ffill’)
data.info()
Out[76]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 4034 entries, 1999-01-04 00:00:00 to 2014-09-26 00:00:00
Data columns (total 2 columns):
EUROSTOXX 4034 non-null float64
VSTOXX 4034 non-null float64
dtypes: float64(2)
In [ 77 ]: data.tail()
Out[77]: EUROSTOXX VSTOXX
2014-09-22 3257.48 15.8303
2014-09-23 3205.93 17.7684
2014-09-24 3244.01 15.9504
2014-09-25 3202.31 17.5658
2014-09-26 3219.58 17.6012

Again, a graphical representation of the new data set might provide some insights. Indeed,


as Figure 6-7 shows, there seems to be a negative correlation between the two indexes:


In  [ 78 ]: data.plot(subplots=True,    grid=True,  style=‘b’,  figsize=( 8 ,    6 ))

Figure 6-7. The EURO STOXX 50 index and the VSTOXX volatility index
Free download pdf