plt.ylabel(‘y’)
Figure 11-19. Sample data points and regression line
The result of the “standard” regression approach is fixed values for the parameters of the
regression line:
In [ 26 ]: reg
Out[26]: array([ 2.03384161, 3.77649234])
Note that the highest-order monomial factor (in this case, the slope of the regression line)
is at index level 0 and that the intercept is at index level 1. The original parameters 2 and 4
are not perfectly recovered, but this of course is due to the noise included in the data.
Next, the Bayesian regression. Here, we assume that the parameters are distributed in a
certain way. For example, consider the equation describing the regression line ŷ(x) = +
· x. We now assume the following priors:
is normally distributed with mean 0 and a standard deviation of 20.
is normally distributed with mean 0 and a standard deviation of 20.
For the likelihood, we assume a normal distribution with mean of ŷ(x) and a uniformly
distributed standard deviation between 0 and 10.
A major element of Bayesian regression is (Markov Chain) Monte Carlo (MCMC)
sampling.
[ 46 ]
In principle, this is the same as drawing balls multiple times from boxes, as
in the previous simple example — just in a more systematic, automated way.
For the technical sampling, there are three different functions to call:
find_MAP finds the starting point for the sampling algorithm by deriving the local
maximum a posteriori point.
NUTS implements the so-called “efficient No-U-Turn Sampler with dual averaging”
(NUTS) algorithm for MCMC sampling given the assumed priors.
sample draws a number of samples given the starting value from find_MAP and the
optimal step size from the NUTS algorithm.
All this is to be wrapped into a PyMC3 Model object and executed within a with statement:
In [ 27 ]: with pm.Model() as model: