Python for Finance: Analyze Big Financial Data

Generation of Random Numbers on GPUs

The last topic in this chapter is the use of devices for massively parallel operations — i.e.,

General Purpose Graphical Processing Units (GPGPUs, or simply GPUs). To use an

Nvidia GPU, we need to have CUDA (Compute Unified Device Architecture, cf.

https://developer.nvidia.com) installed. An easy way to harness the power of Nvidia GPUs

is to use NumbaPro, a performance library by Continuum Analytics that dynamically

compiles Python code for the GPU (or a multicore CPU).

This chapter does not allow us to go into the details of GPU usage for Python

programming. However, there is one financial field that can benefit strongly from the use

of a GPU: Monte Carlo simulation and (pseudo)random number generation in particular.

[ 32 ]

In what follows, we use the native CUDA library curand to generate random numbers on

the GPU:

In [ 97 ]: from numbapro.cudalib import curand

As the benchmark case, we define a function, using NumPy, that delivers a two-dimensional

array of standard normally distributed pseudorandom numbers:

In [ 98 ]: def get_randoms(x, y): rand = np.random.standard_normal((x, y)) return rand

First, let’s check if it works:

In [ 99 ]: get_randoms( 2 , 2 ) Out[99]: array([[-0.30561007, 1.33124048], [-0.04382143, 2.31276888]])

Now the function for the Nvidia GPU:

In [ 100 ]: def get_cuda_randoms(x, y): rand = np.empty((x * y), np.float64) # rand serves as a container for the randoms # CUDA only fills 1-dimensional arrays prng = curand.PRNG(rndtype=curand.PRNG.XORWOW) # the argument sets the random number algorithm prng.normal(rand, 0 , 1 ) # filling the container rand = rand.reshape((x, y)) # to be “fair”, we reshape rand to 2 dimensions return rand

Again, a brief check of the functionality:

In [ 101 ]: get_cuda_randoms( 2 , 2 ) Out[101]: array([[ 1.07102161, 0.70846868], [ 0.89437398, -0.86693007]])

And a first comparison of the performance:

In [ 102 ]: %timeit a = get_randoms( 1000 , 1000 ) Out[102]: 10 loops, best of 3: 72 ms per loop In [ 103 ]: %timeit a = get_cuda_randoms( 1000 , 1000 ) Out[103]: 100 loops, best of 3: 14.8 ms per loop

Now, a more systematic routine to compare the performance:

In [ 104 ]: import time as t step = 1000 def time_comparsion(factor): cuda_times = list() cpu_times = list() for j in range( 1 , 10002 , step): i = j * factor