Python for Finance: Analyze Big Financial Data

Regression Analysis

The previous section introduces the leverage effect as a stylized fact of equity market

returns. So far, the support that we provided is based on the inspection of financial data

plots only. Using pandas, we can also base such analysis on a more formal, statistical

ground. The simplest approach is to use (linear) ordinary least-squares regression (OLS).

In what follows, the analysis uses two different data sets available on the Web:

EURO STOXX 50

Historical daily closing values of the EURO STOXX 50 index, composed of

European blue-chip stocks

VSTOXX

Historical daily closing data for the VSTOXX volatility index, calculated on the basis

of volatilities implied by options on the EURO STOXX 50 index

It is noteworthy that we now (indirectly) use implied volatilities, which relate to

expectations with regard to the future volatility development, while the previous DAX

analysis used historical volatility measures. For details, see the “VSTOXX Advanced

Services” tutorial pages provided by Eurex.

We begin with a few imports:

In [ 62 ]: import pandas as pd from urllib import urlretrieve

For the analysis, we retrieve files from the Web and save them in a folder called data. If

there is no such folder already, you might want to create one first via mkdir data. We

proceed by retrieving the most current available information with regard to both indices:

In [ 63 ]: es_url = ‘http://www.stoxx.com/download/historical_values/hbrbcpe.txt’ vs_url = ‘http://www.stoxx.com/download/historical_values/h_vstoxx.txt’ urlretrieve(es_url, ‘./data/es.txt’) urlretrieve(vs_url, ‘./data/vs.txt’) !ls -o ./data/*.txt # Windows: use dir Out[63]: -rw––- 1 yhilpisch 0 Sep 28 11:14 ./data/es50.txt -rw––- 1 yhilpisch 641180 Sep 28 11:14 ./data/es.txt -rw––- 1 yhilpisch 330564 Sep 28 11:14 ./data/vs.txt

Reading the EURO STOXX 50 data directly with pandas is not the best route in this case.

A little data cleaning beforehand will give a better data structure for the import. Two

issues have to be addressed, relating to the header and the structure:

There are a couple of additional header lines that we do not need for the import.

From December 27, 2001 onwards, the data set “suddenly” has an additional

semicolon at the end of each data row.

The following code reads the whole data set and removes all blanks:

[ 26 ]

In [ 64 ]: lines = open(‘./data/es.txt’, ‘r’).readlines() lines = [line.replace(‘ ‘, ”) for line in lines]

With regard to the header, we can inspect it easily by printing the first couple of lines of

the downloaded data set: