Python for Finance: Analyze Big Financial Data

In [ 30 ]: plt.figure(figsize=( 8 , 4 )) plt.scatter(x, y, c=y, marker=‘v’) plt.colorbar() plt.grid(True) plt.xlabel(‘x’) plt.ylabel(‘y’) for i in range(len(trace)): plt.plot(x, trace[‘alpha’][i] + trace[‘beta’][i] * x)

Figure 11-21. Sample data and regression lines from Bayesian regression

Real Data

Having seen Bayesian regression with PyMC3 in action with dummy data, we now move on

to real market data. In this context, we introduce yet another Python library: zipline (cf.

https://github.com/quantopian/zipline and https://pypi.python.org/pypi/zipline). zipline is

a Pythonic, open source algorithmic trading library that powers the community

backtesting platform Quantopian.

It is also to be installed separately, e.g., by using pip:

$ pip install zipline

After installation, import zipline as well pytz and datetime as follows:

In [ 31 ]: import warnings warnings.simplefilter(‘ignore’) import zipline import pytz import datetime as dt

Similar to pandas, zipline provides a convenience function to load financial data from

different sources. Under the hood, zipline also uses pandas.

The example we use is a “classical” pair trading strategy, namely with gold and stocks of

gold mining companies. These are represented by ETFs with the following symbols,

respectively:

GLD

GDX

We can load the data using zipline as follows:

In [ 32 ]: data = zipline.data.load_from_yahoo(stocks=[‘GLD’, ‘GDX’], end=dt.datetime( 2014 , 3 , 15 , 0 , 0 , 0 , 0 , pytz.utc)).dropna() data.info()