Python for Finance: Analyze Big Financial Data

plt.ylabel(‘y’)

Figure 11-19. Sample data points and regression line

The result of the “standard” regression approach is fixed values for the parameters of the

regression line:

In [ 26 ]: reg Out[26]: array([ 2.03384161, 3.77649234])

Note that the highest-order monomial factor (in this case, the slope of the regression line)

is at index level 0 and that the intercept is at index level 1. The original parameters 2 and 4

are not perfectly recovered, but this of course is due to the noise included in the data.

Next, the Bayesian regression. Here, we assume that the parameters are distributed in a

certain way. For example, consider the equation describing the regression line ŷ(x) = +

· x. We now assume the following priors:

is normally distributed with mean 0 and a standard deviation of 20.

For the likelihood, we assume a normal distribution with mean of ŷ(x) and a uniformly

distributed standard deviation between 0 and 10.

A major element of Bayesian regression is (Markov Chain) Monte Carlo (MCMC)

sampling.

[ 46 ]

In principle, this is the same as drawing balls multiple times from boxes, as

in the previous simple example — just in a more systematic, automated way.

For the technical sampling, there are three different functions to call:

find_MAP finds the starting point for the sampling algorithm by deriving the local

maximum a posteriori point.

NUTS implements the so-called “efficient No-U-Turn Sampler with dual averaging”

(NUTS) algorithm for MCMC sampling given the assumed priors.

sample draws a number of samples given the starting value from find_MAP and the

optimal step size from the NUTS algorithm.

All this is to be wrapped into a PyMC3 Model object and executed within a with statement:

In [ 27 ]: with pm.Model() as model:

Python for Finance: Analyze Big Financial Data

Figure 11-19. Sample data points and regression line

The result of the “standard” regression approach is fixed values for the parameters of the

regression line:

Note that the highest-order monomial factor (in this case, the slope of the regression line)

is at index level 0 and that the intercept is at index level 1. The original parameters 2 and 4

are not perfectly recovered, but this of course is due to the noise included in the data.

Next, the Bayesian regression. Here, we assume that the parameters are distributed in a

certain way. For example, consider the equation describing the regression line ŷ(x) = +

· x. We now assume the following priors:

is normally distributed with mean 0 and a standard deviation of 20.

is normally distributed with mean 0 and a standard deviation of 20.

For the likelihood, we assume a normal distribution with mean of ŷ(x) and a uniformly

distributed standard deviation between 0 and 10.

A major element of Bayesian regression is (Markov Chain) Monte Carlo (MCMC)

sampling.

[ 46 ]

In principle, this is the same as drawing balls multiple times from boxes, as

in the previous simple example — just in a more systematic, automated way.

For the technical sampling, there are three different functions to call:

find_MAP finds the starting point for the sampling algorithm by deriving the local

maximum a posteriori point.

NUTS implements the so-called “efficient No-U-Turn Sampler with dual averaging”

(NUTS) algorithm for MCMC sampling given the assumed priors.

sample draws a number of samples given the starting value from find_MAP and the

optimal step size from the NUTS algorithm.

All this is to be wrapped into a PyMC3 Model object and executed within a with statement:

Get our desktop app

Company

Features

Documentation

Resources