Python for Finance: Analyze Big Financial Data

(Elle) #1
plt.ylabel(‘y’)

Figure 11-19. Sample data points and regression line

The result of the “standard” regression approach is fixed values for the parameters of the


regression line:


In  [ 26 ]: reg
Out[26]: array([ 2.03384161, 3.77649234])

Note that the highest-order monomial factor (in this case, the slope of the regression line)


is at index level 0 and that the intercept is at index level 1. The original parameters 2 and 4


are not perfectly recovered, but this of course is due to the noise included in the data.


Next, the Bayesian regression. Here, we assume that the parameters are distributed in a


certain way. For example, consider the equation describing the regression line ŷ(x) = +


· x. We now assume the following priors:


is normally distributed with mean 0 and a standard deviation of 20.


is normally distributed with mean 0 and a standard deviation of 20.


For the likelihood, we assume a normal distribution with mean of ŷ(x) and a uniformly


distributed standard deviation between 0 and 10.


A major element of Bayesian regression is (Markov Chain) Monte Carlo (MCMC)


sampling.


[ 46 ]

In principle, this is the same as drawing balls multiple times from boxes, as


in the previous simple example — just in a more systematic, automated way.


For the technical sampling, there are three different functions to call:


find_MAP finds the starting point for the sampling algorithm by deriving the local


maximum a posteriori point.


NUTS implements the so-called “efficient No-U-Turn Sampler with dual averaging”


(NUTS) algorithm for MCMC sampling given the assumed priors.


sample draws a number of samples given the starting value from find_MAP and the


optimal step size from the NUTS algorithm.


All this is to be wrapped into a PyMC3 Model object and executed within a with statement:


In  [ 27 ]: with pm.Model() as model:
Free download pdf