Python for Finance: Analyze Big Financial Data

(Elle) #1
print “%14s %15.5f” %   (‘skew’,    sta[ 4 ])
print “%14s %15.5f” % (‘kurtosis’, sta[ 5 ])

For example, the following shows the function in action, using a flattened version of the


ndarray object containing the log returns. The method flatten returns a 1D array with all


the data given in a multidimensional array:


In  [ 10 ]: print_statistics(log_returns.flatten())
Out[10]: statistic value
––––––––––
size 12500000.00000
min -0.15664
max 0.15371
mean 0.00060
std 0.02828
skew 0.00055
kurtosis 0.00085

The data set in this case consists of 12,500,000 data points with the values mainly lying


between +/– 0.15. We would expect annualized values of 0.05 for the mean return and 0.2


for the standard deviation (volatility). The annualized values of the data set come close to


these values, if not matching them perfectly (multiply the mean value by 50 and the


standard deviation by ).


Figure 11-2 compares the frequency distribution of the simulated log returns with the


probability density function (pdf) of the normal distribution given the parameterizations


for r and sigma. The function used is norm.pdf from the scipy.stats sublibrary. There is


obviously quite a good fit:


In  [ 11 ]: plt.hist(log_returns.flatten(), bins= 70 ,  normed=True,    label=‘frequency’)
plt.grid(True)
plt.xlabel(‘log-return’)
plt.ylabel(‘frequency’)
x = np.linspace(plt.axis()[ 0 ], plt.axis()[ 1 ])
plt.plot(x, scs.norm.pdf(x, loc=r / M, scale=sigma / np.sqrt(M)),
‘r’, lw=2.0, label=‘pdf’)
plt.legend()

Figure 11-2. Histogram of log returns and normal density function

Comparing a frequency distribution (histogram) with a theoretical pdf is not the only way


to graphically “test” for normality. So-called quantile-quantile plots (qq plots) are also


well suited for this task. Here, sample quantile values are compared to theoretical quantile


values. For normally distributed sample data sets, such a plot might look like Figure 11-3,


with the absolute majority of the quantile values (dots) lying on a straight line:


In  [ 12 ]: sm.qqplot(log_returns.flatten()[:: 500 ],   line=‘s’)
Free download pdf