generally most useful when seen as diachronic — i.e., in the sense that new data revealed
over time allows for better regressions and estimates.
To incorporate this concept in the current example, we assume that the regression
parameters are not only random and distributed in some fashion, but that they follow some
kind of random walk over time. It is the same generalization used when making the
transition in finance theory from random variables to stochastic processes (which are
essentially ordered sequences of random variables):
To this end, we define a new PyMC3 model, this time specifying parameter values as
random walks with the variance parameter values transformed to log space (for better
sampling characteristics).
In [ 42 ]: model_randomwalk = pm.Model()
with model_randomwalk:
# std of random walk best sampled in log space
sigma_alpha, log_sigma_alpha = \
model_randomwalk.TransformedVar(‘sigma_alpha’,
pm.Exponential.dist(1. / . 02 , testval=. 1 ),
pm.logtransform)
sigma_beta, log_sigma_beta = \
model_randomwalk.TransformedVar(‘sigma_beta’,
pm.Exponential.dist(1. / . 02 , testval=. 1 ),
pm.logtransform)
After having specified the distributions of the random walk parameters, we can proceed
with specifying the random walks for alpha and beta. To make the whole procedure more
efficient, 50 data points at a time share common coefficients:
In [ 43 ]: from pymc.distributions.timeseries import GaussianRandomWalk
# to make the model simpler, we will apply the same coefficients
# to 50 data points at a time
subsample_alpha = 50
subsample_beta = 50
with model_randomwalk:
alpha = GaussianRandomWalk(‘alpha’, sigma_alpha**- 2 ,
shape=len(data) / subsample_alpha)
beta = GaussianRandomWalk(‘beta’, sigma_beta**- 2 ,
shape=len(data) / subsample_beta)
# make coefficients have the same length as prices
alpha_r = np.repeat(alpha, subsample_alpha)
beta_r = np.repeat(beta, subsample_beta)
The time series data sets have a length of 1,967 data points:
In [ 44 ]: len(data.dropna().GDX.values) # a bit longer than 1,950
Out[44]: 1967
For the sampling to follow, the number of data points must be divisible by 50. Therefore,
only the first 1,950 data points are taken for the regression:
In [ 45 ]: with model_randomwalk:
# define regression
regression = alpha_r + beta_r * data.GDX.values[: 1950 ]
# assume prices are normally distributed
# the mean comes from the regression
sd = pm.Uniform(‘sd’, 0 , 20 )
likelihood = pm.Normal(‘GLD’,
mu=regression,
sd=sd,
observed=data.GLD.values[: 1950 ])
All these definitions are a bit more involved than before due to the use of random walks
instead of a single random variable. However, the inference steps with the MCMC remain