Python for Finance: Analyze Big Financial Data

(Elle) #1

generally most useful when seen as diachronic — i.e., in the sense that new data revealed


over time allows for better regressions and estimates.


To incorporate this concept in the current example, we assume that the regression


parameters are not only random and distributed in some fashion, but that they follow some


kind of random walk over time. It is the same generalization used when making the


transition in finance theory from random variables to stochastic processes (which are


essentially ordered sequences of random variables):


To this end, we define a new PyMC3 model, this time specifying parameter values as


random walks with the variance parameter values transformed to log space (for better


sampling characteristics).


In  [ 42 ]: model_randomwalk    =   pm.Model()
with model_randomwalk:
# std of random walk best sampled in log space
sigma_alpha, log_sigma_alpha = \
model_randomwalk.TransformedVar(‘sigma_alpha’,
pm.Exponential.dist(1. / . 02 , testval=. 1 ),
pm.logtransform)
sigma_beta, log_sigma_beta = \
model_randomwalk.TransformedVar(‘sigma_beta’,
pm.Exponential.dist(1. / . 02 , testval=. 1 ),
pm.logtransform)

After having specified the distributions of the random walk parameters, we can proceed


with specifying the random walks for alpha and beta. To make the whole procedure more


efficient, 50 data points at a time share common coefficients:


In  [ 43 ]: from pymc.distributions.timeseries import GaussianRandomWalk
# to make the model simpler, we will apply the same coefficients
# to 50 data points at a time
subsample_alpha = 50
subsample_beta = 50

with model_randomwalk:
alpha = GaussianRandomWalk(‘alpha’, sigma_alpha**- 2 ,
shape=len(data) / subsample_alpha)
beta = GaussianRandomWalk(‘beta’, sigma_beta**- 2 ,
shape=len(data) / subsample_beta)

#   make    coefficients    have    the same    length  as  prices
alpha_r = np.repeat(alpha, subsample_alpha)
beta_r = np.repeat(beta, subsample_beta)

The time series data sets have a length of 1,967 data points:


In  [ 44 ]: len(data.dropna().GDX.values)       #   a   bit longer  than    1,950
Out[44]: 1967

For the sampling to follow, the number of data points must be divisible by 50. Therefore,


only the first 1,950 data points are taken for the regression:


In  [ 45 ]: with model_randomwalk:
# define regression
regression = alpha_r + beta_r * data.GDX.values[: 1950 ]

#   assume  prices  are normally    distributed
# the mean comes from the regression
sd = pm.Uniform(‘sd’, 0 , 20 )
likelihood = pm.Normal(‘GLD’,
mu=regression,
sd=sd,
observed=data.GLD.values[: 1950 ])

All these definitions are a bit more involved than before due to the use of random walks


instead of a single random variable. However, the inference steps with the MCMC remain

Free download pdf