Python for Finance: Analyze Big Financial Data

(Elle) #1

Dynamic Compiling


Numba is an open source, NumPy-aware optimizing compiler for Python code. It uses the


LLVM compiler infrastructure


[ 31 ]

to compile Python byte code to machine code especially


for use in the NumPy runtime and SciPy modules.


Introductory Example


Let us start with a problem that typically leads to performance issues in Python:


algorithms with nested loops. A sandbox variant can illustrate the problem:


In  [ 54 ]: from math import cos,   log
def f_py(I, J):
res = 0
for i in range(I):
for j in range (J):
res += int(cos(log( 1 )))
return res

In a somewhat compute-intensive way, this function returns the total number of loops


given the input parameters I and J. Setting both equal to 5,000 leads to 25,000,000 loops:


In  [ 55 ]: I,  J   =    5000 ,  5000
%time f_py(I, J)
Out[55]: CPU times: user 17.4 s, sys: 2.3 s, total: 19.7 s
Wall time: 15.2 s

25000000

In principle, this can be vectorized with the help of NumPy ndarray objects:


In  [ 56 ]: def f_np(I, J):
a = np.ones((I, J), dtype=np.float64)
return int(np.sum(np.cos(np.log(a)))), a
In [ 57 ]: %time res, a = f_np(I, J)
Out[57]: CPU times: user 1.41 s, sys: 285 ms, total: 1.69 s
Wall time: 1.65 s

This is much faster, roughly by a factor of 8–10 times, but not really memory-efficient.


The ndarray object consumes 200 MB of memory:


In  [ 58 ]: a.nbytes
Out[58]: 200000000

I and J can easily be chosen to make the NumPy approach infeasible given a certain size of


RAM. Numba provides an attractive alternative to tackle the performance issue of such


loop structures while preserving the memory efficiency of the pure Python approach:


In  [ 59 ]: import numba as nb

With Numba you only need to apply the jit function to the pure Python function to


generate a Python-callable, compiled version of the function:


In  [ 60 ]: f_nb    =   nb.jit(f_py)

As promised, this new function can be called directly from within the Python interpreter,


realizing a significant speedup compared to the NumPy vectorized version:


In  [ 61 ]: %time f_nb(I,   J)
Out[61]: CPU times: user 143 ms, sys: 12 ms, total: 155 ms
Wall time: 139 ms

                                    25000000L

Again, let us compare the performance of the different alternatives a bit more

Free download pdf