Python for Finance: Analyze Big Financial Data

(Elle) #1

Regression Analysis


The previous section introduces the leverage effect as a stylized fact of equity market


returns. So far, the support that we provided is based on the inspection of financial data


plots only. Using pandas, we can also base such analysis on a more formal, statistical


ground. The simplest approach is to use (linear) ordinary least-squares regression (OLS).


In what follows, the analysis uses two different data sets available on the Web:


EURO STOXX 50


Historical daily closing values of the EURO STOXX 50 index, composed of


European blue-chip stocks


VSTOXX


Historical daily closing data for the VSTOXX volatility index, calculated on the basis


of volatilities implied by options on the EURO STOXX 50 index


It is noteworthy that we now (indirectly) use implied volatilities, which relate to


expectations with regard to the future volatility development, while the previous DAX


analysis used historical volatility measures. For details, see the “VSTOXX Advanced


Services” tutorial pages provided by Eurex.


We begin with a few imports:


In  [ 62 ]: import pandas as pd
from urllib import urlretrieve

For the analysis, we retrieve files from the Web and save them in a folder called data. If


there is no such folder already, you might want to create one first via mkdir data. We


proceed by retrieving the most current available information with regard to both indices:


In  [ 63 ]: es_url  =   ‘http://www.stoxx.com/download/historical_values/hbrbcpe.txt’
vs_url = ‘http://www.stoxx.com/download/historical_values/h_vstoxx.txt’
urlretrieve(es_url, ‘./data/es.txt’)
urlretrieve(vs_url, ‘./data/vs.txt’)
!ls -o ./data/*.txt
# Windows: use dir
Out[63]: -rw––- 1 yhilpisch 0 Sep 28 11:14 ./data/es50.txt
-rw––- 1 yhilpisch 641180 Sep 28 11:14 ./data/es.txt
-rw––- 1 yhilpisch 330564 Sep 28 11:14 ./data/vs.txt

Reading the EURO STOXX 50 data directly with pandas is not the best route in this case.


A little data cleaning beforehand will give a better data structure for the import. Two


issues have to be addressed, relating to the header and the structure:


There are a couple of additional header lines that we do not need for the import.


From December 27, 2001 onwards, the data set “suddenly” has an additional


semicolon at the end of each data row.


The following code reads the whole data set and removes all blanks:


[ 26 ]

In  [ 64 ]: lines   =   open(‘./data/es.txt’,   ‘r’).readlines()
lines = [line.replace(‘ ‘, ”) for line in lines]

With regard to the header, we can inspect it easily by printing the first couple of lines of


the downloaded data set:

Free download pdf