Python for Finance: Analyze Big Financial Data

(Elle) #1

Chapter 11. Statistics


I can prove anything by statistics except the truth.

— George Canning

Statistics is a vast field. The tools and results the field provides have become indispensible


for finance. This also explains the popularity of domain-specific languages like R in the


finance industry. The more elaborate and complex statistical models become, the more


important it is to have available easy-to-use and high-performing computational solutions.


A single chapter in a book like this one cannot do justice to the richness and the broadness


of the field of statistics. Therefore, the approach — as in many other chapters — is to


focus on selected topics that seem of paramount importance or that provide a good starting


point when it comes to the use of Python for the particular tasks at hand. The chapter has


four focal points:


Normality tests


A large number of important financial models, like the mean-variance portfolio


theory and the capital asset pricing model (CAPM), rest on the assumption that


returns of securities are normally distributed; therefore, this chapter presents some


approaches to test a given time series for normality of returns.


Portfolio theory


Modern portfolio theory (MPT) can be considered one of the biggest successes of


statistics in finance; starting in the early 1950s with the work of pioneer Harry


Markowitz, this theory began to replace people’s reliance on judgment and


experience with rigorous mathematical and statistical methods when it comes to the


investment of money in financial markets. In that sense, it is maybe the first real


quantitative approach in finance.


Principal component analysis


Principal component analysis (PCA) is quite a popular tool in finance, for example,


when it comes to implementing equity investment strategies or analyzing the


principal components that explain the movement in interest rates. Its major benefit is


“complexity reduction,” achieved by deriving a small set of linearly independent


(noncorrelated, orthogonal) components from a potentially large set of maybe highly


correlated time series components; we illustrate the application based on the German


DAX index and the 30 stocks contained in that index.


Bayesian regression


On a fundamental level, Bayesian statistics introduces the notion of beliefs of agents


and the updating of beliefs to statistics; when it comes to linear regression, for


example, this might take on the form of having a statistical distribution for regression


parameters instead of single point estimates (e.g., for the intercept and slope of the


regression line). Nowadays, Bayesian methods are rather popular and important in


finance, which is why we illustrate some (advanced) applications in this chapter.


Many aspects in this chapter relate to date and/or time information. Refer to Appendix C

Free download pdf