Python for Finance: Analyze Big Financial Data

(Elle) #1

As a next step, consider the different statistics for the time series data sets. The kurtosis


values seem to be especially far from normal for all four data sets:


In  [ 29 ]: for sym in symbols:
print “\nResults for symbol %s” % sym
print 30 * “-”
log_data = np.array(log_returns[sym].dropna())
print_statistics(log_data)
Out[29]: Results for symbol ^GDAXI
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.07739
max 0.10797
mean 0.00025
std 0.01462
skew 0.02573
kurtosis 6.52461

                                    Results for symbol  ^GSPC
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.09470
max 0.10957
mean 0.00020
std 0.01360
skew -0.32017
kurtosis 10.05425

                                    Results for symbol  YHOO
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.24636
max 0.39182
mean -0.00000
std 0.02620
skew 0.56530
kurtosis 31.98659

                                    Results for symbol  MSFT
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.12476
max 0.17039
mean 0.00034
std 0.01792
skew 0.04262
kurtosis 10.18038

We will inspect the data of two symbols via a qq plot. Figure 11-9 shows the qq plot for


the S&P 500. Obviously, the sample quantile values do not lie on a straight line, indicating


“nonnormality.” On the left and right sides there are many values that lie well below the


line and well above the line, respectively. In other words, the time series data exhibits fat


tails. This term refers to a (frequency) distribution where negative and positive outliers are


observed far more often than a normal distribution would imply. The code to generate this


plot is as follows:


In  [ 30 ]: sm.qqplot(log_returns[‘^GSPC’].dropna(),    line=‘s’)
plt.grid(True)
plt.xlabel(‘theoretical quantiles’)
plt.ylabel(‘sample quantiles’)
Free download pdf