As a next step, consider the different statistics for the time series data sets. The kurtosis
values seem to be especially far from normal for all four data sets:
In [ 29 ]: for sym in symbols:
print “\nResults for symbol %s” % sym
print 30 * “-”
log_data = np.array(log_returns[sym].dropna())
print_statistics(log_data)
Out[29]: Results for symbol ^GDAXI
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.07739
max 0.10797
mean 0.00025
std 0.01462
skew 0.02573
kurtosis 6.52461
Results for symbol ^GSPC
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.09470
max 0.10957
mean 0.00020
std 0.01360
skew -0.32017
kurtosis 10.05425
Results for symbol YHOO
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.24636
max 0.39182
mean -0.00000
std 0.02620
skew 0.56530
kurtosis 31.98659
Results for symbol MSFT
––––––––––
statistic value
––––––––––
size 2178.00000
min -0.12476
max 0.17039
mean 0.00034
std 0.01792
skew 0.04262
kurtosis 10.18038
We will inspect the data of two symbols via a qq plot. Figure 11-9 shows the qq plot for
the S&P 500. Obviously, the sample quantile values do not lie on a straight line, indicating
“nonnormality.” On the left and right sides there are many values that lie well below the
line and well above the line, respectively. In other words, the time series data exhibits fat
tails. This term refers to a (frequency) distribution where negative and positive outliers are
observed far more often than a normal distribution would imply. The code to generate this
plot is as follows:
In [ 30 ]: sm.qqplot(log_returns[‘^GSPC’].dropna(), line=‘s’)
plt.grid(True)
plt.xlabel(‘theoretical quantiles’)
plt.ylabel(‘sample quantiles’)