mpl_dates
Out[17]: array([ 733776., 733777., 733778., ..., 735500., 735501., 735502.])
This new date list can be used for a scatter plot, highlighting through different colors
which date each data point is from. Figure 11-17 shows the data in this fashion:
In [ 18 ]: plt.figure(figsize=( 8 , 4 ))
plt.scatter(dax[‘PCA_5’], dax[‘^GDAXI’], c=mpl_dates)
lin_reg = np.polyval(np.polyfit(dax[‘PCA_5’],
dax[‘^GDAXI’], 1 ),
dax[‘PCA_5’])
plt.plot(dax[‘PCA_5’], lin_reg, ‘r’, lw= 3 )
plt.grid(True)
plt.xlabel(‘PCA_5’)
plt.ylabel(‘^GDAXI’)
plt.colorbar(ticks=mpl.dates.DayLocator(interval= 250 ),
format=mpl.dates.DateFormatter(’%d %b %y’))
Figure 11-17. DAX return values against PCA return values with linear regression
Figure 11-17 reveals that there is obviously some kind of structural break sometime in the
middle of 2011. If the PCA index were to perfectly replicate the DAX index, we would
expect all the points to lie on a straight line and to see the regression line going through
these points. Perfection is hard to achieve, but we can maybe do better.
To this end, let us divide the total time frame into two subintervals. We can then
implement an early and a late regression:
In [ 19 ]: cut_date = ‘2011/7/1’
early_pca = dax[dax.index < cut_date][‘PCA_5’]
early_reg = np.polyval(np.polyfit(early_pca,
dax[‘^GDAXI’][dax.index < cut_date], 1 ),
early_pca)
In [ 20 ]: late_pca = dax[dax.index >= cut_date][‘PCA_5’]
late_reg = np.polyval(np.polyfit(late_pca,
dax[‘^GDAXI’][dax.index >= cut_date], 1 ),
late_pca)
Figure 11-18 shows the new regression lines, which indeed display the high explanatory
power both before our cutoff date and thereafter. This heuristic approach will be made a
bit more formal in the next section on Bayesian statistics:
In [ 21 ]: plt.figure(figsize=( 8 , 4 ))
plt.scatter(dax[‘PCA_5’], dax[‘^GDAXI’], c=mpl_dates)
plt.plot(early_pca, early_reg, ‘r’, lw= 3 )
plt.plot(late_pca, late_reg, ‘r’, lw= 3 )
plt.grid(True)
plt.xlabel(‘PCA_5’)
plt.ylabel(‘^GDAXI’)
plt.colorbar(ticks=mpl.dates.DayLocator(interval= 250 ),
format=mpl.dates.DateFormatter(’%d %b %y’))