Python for Finance: Analyze Big Financial Data

(Elle) #1

multiprocessing


The advantage of IPython.parallel is that it scales over small- and medium-sized


clusters (e.g., with 256 nodes). Sometimes it is, however, helpful to parallelize code


execution locally. This is where the “standard” multiprocessing module of Python might


prove beneficial:


In  [ 48 ]: import multiprocessing as mp

Consider the following function to simulate a geometric Brownian motion:


In  [ 49 ]: import math
def simulate_geometric_brownian_motion(p):
M, I = p
# time steps, paths
S0 = 100 ; r = 0.05; sigma = 0.2; T = 1.0
# model parameters
dt = T / M
paths = np.zeros((M + 1 , I))
paths[ 0 ] = S0
for t in range( 1 , M + 1 ):
paths[t] = paths[t - 1 ] * np.exp((r - 0.5 * sigma ** 2 ) * dt +
sigma * math.sqrt(dt) * np.random.standard_normal(I))
return paths

This function returns simulated paths given the parameterization for M and I:


In  [ 50 ]: paths   =   simulate_geometric_brownian_motion(( 5 ,     2 ))
paths
Out[50]: array([[ 100. , 100. ],
[ 93.65851581, 98.93916652],
[ 94.70157252, 93.44208625],
[ 96.73499004, 97.88294562],
[ 110.64677908, 96.04515015],
[ 124.09826521, 101.86087283]])

Let us implement a test series on a server with eight cores and the following parameter


values. In particular, we want to do 100 simulations:


In  [ 51 ]: I   =    10000       #  number  of  paths
M = 100 # number of time steps
t = 100 # number of tasks/simulations
In [ 52 ]: # running on server with 8 cores/16 threads
from time import time
times = []
for w in range( 1 , 17 ):
t0 = time()
pool = mp.Pool(processes=w)
# the pool of workers
result = pool.map(simulate_geometric_brownian_motion,
t * [(M, I), ])
# the mapping of the function to the list of parameter tuples
times.append(time() - t0)

We again come to the conclusion that performance scales with the number of cores


available. Hyperthreading, however, does not add much (or is even worse) in this case, as


Figure 8-4 illustrates:


In  [ 53 ]: plt.plot(range( 1 ,  17 ),  times)
plt.plot(range( 1 , 17 ), times, ‘ro’)
plt.grid(True)
plt.xlabel(‘number of processes’)
plt.ylabel(‘time in seconds’)
plt.title(’%d Monte Carlo simulations’ % t)
Free download pdf