Python for Finance: Analyze Big Financial Data

Chapter 11. Statistics

I can prove anything by statistics except the truth.

— George Canning

Statistics is a vast field. The tools and results the field provides have become indispensible

for finance. This also explains the popularity of domain-specific languages like R in the

finance industry. The more elaborate and complex statistical models become, the more

important it is to have available easy-to-use and high-performing computational solutions.

A single chapter in a book like this one cannot do justice to the richness and the broadness

of the field of statistics. Therefore, the approach — as in many other chapters — is to

focus on selected topics that seem of paramount importance or that provide a good starting

point when it comes to the use of Python for the particular tasks at hand. The chapter has

four focal points:

Normality tests

A large number of important financial models, like the mean-variance portfolio

theory and the capital asset pricing model (CAPM), rest on the assumption that

returns of securities are normally distributed; therefore, this chapter presents some

approaches to test a given time series for normality of returns.

Portfolio theory

Modern portfolio theory (MPT) can be considered one of the biggest successes of

statistics in finance; starting in the early 1950s with the work of pioneer Harry

Markowitz, this theory began to replace people’s reliance on judgment and

experience with rigorous mathematical and statistical methods when it comes to the

investment of money in financial markets. In that sense, it is maybe the first real

quantitative approach in finance.

Principal component analysis

Principal component analysis (PCA) is quite a popular tool in finance, for example,

when it comes to implementing equity investment strategies or analyzing the

principal components that explain the movement in interest rates. Its major benefit is

“complexity reduction,” achieved by deriving a small set of linearly independent

(noncorrelated, orthogonal) components from a potentially large set of maybe highly

correlated time series components; we illustrate the application based on the German

DAX index and the 30 stocks contained in that index.

Bayesian regression

On a fundamental level, Bayesian statistics introduces the notion of beliefs of agents

and the updating of beliefs to statistics; when it comes to linear regression, for

example, this might take on the form of having a statistical distribution for regression

parameters instead of single point estimates (e.g., for the intercept and slope of the

regression line). Nowadays, Bayesian methods are rather popular and important in

finance, which is why we illustrate some (advanced) applications in this chapter.

Many aspects in this chapter relate to date and/or time information. Refer to Appendix C