Python for Finance: Analyze Big Financial Data

(Elle) #1

Generation of Random Numbers on GPUs


The last topic in this chapter is the use of devices for massively parallel operations — i.e.,


General Purpose Graphical Processing Units (GPGPUs, or simply GPUs). To use an


Nvidia GPU, we need to have CUDA (Compute Unified Device Architecture, cf.


https://developer.nvidia.com) installed. An easy way to harness the power of Nvidia GPUs


is to use NumbaPro, a performance library by Continuum Analytics that dynamically


compiles Python code for the GPU (or a multicore CPU).


This chapter does not allow us to go into the details of GPU usage for Python


programming. However, there is one financial field that can benefit strongly from the use


of a GPU: Monte Carlo simulation and (pseudo)random number generation in particular.


[ 32 ]

In what follows, we use the native CUDA library curand to generate random numbers on


the GPU:


In  [ 97 ]: from numbapro.cudalib import curand

As the benchmark case, we define a function, using NumPy, that delivers a two-dimensional


array of standard normally distributed pseudorandom numbers:


In  [ 98 ]: def get_randoms(x,  y):
rand = np.random.standard_normal((x, y))
return rand

First, let’s check if it works:


In  [ 99 ]: get_randoms( 2 ,     2 )
Out[99]: array([[-0.30561007, 1.33124048],
[-0.04382143, 2.31276888]])

Now the function for the Nvidia GPU:


In  [ 100 ]:    def get_cuda_randoms(x, y):
rand = np.empty((x * y), np.float64)
# rand serves as a container for the randoms
# CUDA only fills 1-dimensional arrays
prng = curand.PRNG(rndtype=curand.PRNG.XORWOW)
# the argument sets the random number algorithm
prng.normal(rand, 0 , 1 ) # filling the container
rand = rand.reshape((x, y))
# to be “fair”, we reshape rand to 2 dimensions
return rand

Again, a brief check of the functionality:


In  [ 101 ]:    get_cuda_randoms( 2 ,    2 )
Out[101]: array([[ 1.07102161, 0.70846868],
[ 0.89437398, -0.86693007]])

And a first comparison of the performance:


In  [ 102 ]:    %timeit a   =   get_randoms( 1000 ,  1000 )
Out[102]: 10 loops, best of 3: 72 ms per loop
In [ 103 ]: %timeit a = get_cuda_randoms( 1000 , 1000 )
Out[103]: 100 loops, best of 3: 14.8 ms per loop

Now, a more systematic routine to compare the performance:


In  [ 104 ]:    import time as t
step = 1000
def time_comparsion(factor):
cuda_times = list()
cpu_times = list()
for j in range( 1 , 10002 , step):
i = j * factor
Free download pdf