Dynamic Compiling
Numba is an open source, NumPy-aware optimizing compiler for Python code. It uses the
LLVM compiler infrastructure
[ 31 ]
to compile Python byte code to machine code especially
for use in the NumPy runtime and SciPy modules.
Introductory Example
Let us start with a problem that typically leads to performance issues in Python:
algorithms with nested loops. A sandbox variant can illustrate the problem:
In [ 54 ]: from math import cos, log
def f_py(I, J):
res = 0
for i in range(I):
for j in range (J):
res += int(cos(log( 1 )))
return res
In a somewhat compute-intensive way, this function returns the total number of loops
given the input parameters I and J. Setting both equal to 5,000 leads to 25,000,000 loops:
In [ 55 ]: I, J = 5000 , 5000
%time f_py(I, J)
Out[55]: CPU times: user 17.4 s, sys: 2.3 s, total: 19.7 s
Wall time: 15.2 s
25000000
In principle, this can be vectorized with the help of NumPy ndarray objects:
In [ 56 ]: def f_np(I, J):
a = np.ones((I, J), dtype=np.float64)
return int(np.sum(np.cos(np.log(a)))), a
In [ 57 ]: %time res, a = f_np(I, J)
Out[57]: CPU times: user 1.41 s, sys: 285 ms, total: 1.69 s
Wall time: 1.65 s
This is much faster, roughly by a factor of 8–10 times, but not really memory-efficient.
The ndarray object consumes 200 MB of memory:
In [ 58 ]: a.nbytes
Out[58]: 200000000
I and J can easily be chosen to make the NumPy approach infeasible given a certain size of
RAM. Numba provides an attractive alternative to tackle the performance issue of such
loop structures while preserving the memory efficiency of the pure Python approach:
In [ 59 ]: import numba as nb
With Numba you only need to apply the jit function to the pure Python function to
generate a Python-callable, compiled version of the function:
In [ 60 ]: f_nb = nb.jit(f_py)
As promised, this new function can be called directly from within the Python interpreter,
realizing a significant speedup compared to the NumPy vectorized version:
In [ 61 ]: %time f_nb(I, J)
Out[61]: CPU times: user 143 ms, sys: 12 ms, total: 155 ms
Wall time: 139 ms
25000000L
Again, let us compare the performance of the different alternatives a bit more