Python for Finance: Analyze Big Financial Data

(Elle) #1
mpl_dates
Out[17]: array([ 733776., 733777., 733778., ..., 735500., 735501., 735502.])

This new date list can be used for a scatter plot, highlighting through different colors


which date each data point is from. Figure 11-17 shows the data in this fashion:


In  [ 18 ]: plt.figure(figsize=( 8 ,     4 ))
plt.scatter(dax[‘PCA_5’], dax[‘^GDAXI’], c=mpl_dates)
lin_reg = np.polyval(np.polyfit(dax[‘PCA_5’],
dax[‘^GDAXI’], 1 ),
dax[‘PCA_5’])
plt.plot(dax[‘PCA_5’], lin_reg, ‘r’, lw= 3 )
plt.grid(True)
plt.xlabel(‘PCA_5’)
plt.ylabel(‘^GDAXI’)
plt.colorbar(ticks=mpl.dates.DayLocator(interval= 250 ),
format=mpl.dates.DateFormatter(’%d %b %y’))

Figure 11-17. DAX return values against PCA return values with linear regression

Figure 11-17 reveals that there is obviously some kind of structural break sometime in the


middle of 2011. If the PCA index were to perfectly replicate the DAX index, we would


expect all the points to lie on a straight line and to see the regression line going through


these points. Perfection is hard to achieve, but we can maybe do better.


To this end, let us divide the total time frame into two subintervals. We can then


implement an early and a late regression:


In  [ 19 ]: cut_date    =   ‘2011/7/1’
early_pca = dax[dax.index < cut_date][‘PCA_5’]
early_reg = np.polyval(np.polyfit(early_pca,
dax[‘^GDAXI’][dax.index < cut_date], 1 ),
early_pca)
In [ 20 ]: late_pca = dax[dax.index >= cut_date][‘PCA_5’]
late_reg = np.polyval(np.polyfit(late_pca,
dax[‘^GDAXI’][dax.index >= cut_date], 1 ),
late_pca)

Figure 11-18 shows the new regression lines, which indeed display the high explanatory


power both before our cutoff date and thereafter. This heuristic approach will be made a


bit more formal in the next section on Bayesian statistics:


In  [ 21 ]: plt.figure(figsize=( 8 ,     4 ))
plt.scatter(dax[‘PCA_5’], dax[‘^GDAXI’], c=mpl_dates)
plt.plot(early_pca, early_reg, ‘r’, lw= 3 )
plt.plot(late_pca, late_reg, ‘r’, lw= 3 )
plt.grid(True)
plt.xlabel(‘PCA_5’)
plt.ylabel(‘^GDAXI’)
plt.colorbar(ticks=mpl.dates.DayLocator(interval= 250 ),
format=mpl.dates.DateFormatter(’%d %b %y’))
Free download pdf