Python for Finance: Analyze Big Financial Data

Portfolio Optimization

Modern or mean-variance portfolio theory (MPT) is a major cornerstone of financial

theory. Based on this theoretical breakthrough the Nobel Prize in Economics was awarded

to its inventor, Harry Markowitz, in 1990. Although formulated in the 1950s,

[ 41 ]

it is still a

theory taught to finance students and applied in practice today (often with some minor or

major modifications). This section illustrates the fundamental principles of the theory.

Chapter 5 in the book by Copeland, Weston, and Shastri (2005) provides a good

introduction to the formal topics associated with MPT. As pointed out previously, the

assumption of normally distributed returns is fundamental to the theory:

By looking only at mean and variance, we are necessarily assuming that no other statistics are necessary to

describe the distribution of end-of-period wealth. Unless investors have a special type of utility function

(quadratic utility function), it is necessary to assume that returns have a normal distribution, which can be

completely described by mean and variance.

The Data

Let us begin our Python session by importing a couple of by now well-known libraries:

In [ 33 ]: import numpy as np import pandas as pd import pandas.io.data as web import matplotlib.pyplot as plt %matplotlib inline

We pick five different assets for the analysis: American tech stocks Apple Inc., Yahoo!

Inc., and Microsoft Inc., as well as German Deutsche Bank AG and gold as a commodity

via an exchange-traded fund (ETF). The basic idea of MPT is diversification to achieve a

minimal portfolio risk or maximal portfolio returns given a certain level of risk. One

would expect such results for the right combination of a large enough number of assets

and a certain diversity in the assets. However, to convey the basic ideas and to show

typical effects, these five assets shall suffice:

In [ 34 ]: symbols = [‘AAPL’, ‘MSFT’, ‘YHOO’, ‘DB’, ‘GLD’] noa = len(symbols)

Using the DataReader function of pandas (cf. Chapter 6) makes getting the time series

data rather efficient. We are only interested, as in the previous example, in the Close

prices of each stock:

In [ 35 ]: data = pd.DataFrame() for sym in symbols: data[sym] = web.DataReader(sym, data_source=‘yahoo’, end=‘2014-09-12’)[‘Adj Close’] data.columns = symbols

Figure 11-11 shows the time series data in normalized fashion graphically:

In [ 36 ]: (data / data.ix[ 0 ] * 100 ).plot(figsize=( 8 , 5 ))