In [ 11 ]: def f5(a):
ex = ‘abs(cos(a)) ** 0.5 + sin(2 + 3 * a)’
ne.set_num_threads( 1 )
return ne.evaluate(ex)
In [ 12 ]: def f6(a):
ex = ‘abs(cos(a)) ** 0.5 + sin(2 + 3 * a)’
ne.set_num_threads( 16 )
return ne.evaluate(ex)
In total, the same task — i.e., the evaluation of the numerical expression in Equation 8-1
on an array of size 500,000 — is implemented in six different ways:
Standard Python function with explicit looping
Iterator approach with implicit looping
Iterator approach with implicit looping and using eval
NumPy vectorized implementation
Single-threaded implementation using numexpr
Multithreaded implementation using numexpr
First, let us check whether the implementations deliver the same results. We use the
IPython cell magic command %%time to record the total execution time:
In [ 13 ]: %%time
r1 = f1(a_py)
r2 = f2(a_py)
r3 = f3(a_py)
r4 = f4(a_np)
r5 = f5(a_np)
r6 = f6(a_np)
Out[13]: CPU times: user 16 s, sys: 125 ms, total: 16.1 s
Wall time: 16 s
The NumPy function allclose allows for easy checking of whether two ndarray(-like)
objects contain the same data:
In [ 14 ]: np.allclose(r1, r2)
Out[14]: True
In [ 15 ]: np.allclose(r1, r3)
Out[15]: True
In [ 16 ]: np.allclose(r1, r4)
Out[16]: True
In [ 17 ]: np.allclose(r1, r5)
Out[17]: True
In [ 18 ]: np.allclose(r1, r6)
Out[18]: True
This obviously is the case. The more interesting question, of course, is how the different
implementations compare with respect to execution speed. To this end, we use the
perf_comp_data function and provide all the function and data set names to it:
In [ 19 ]: func_list = [‘f1’, ‘f2’, ‘f3’, ‘f4’, ‘f5’, ‘f6’]
data_list = [‘a_py’, ‘a_py’, ‘a_py’, ‘a_np’, ‘a_np’, ‘a_np’]
We now have everything together to initiate the competition:
In [ 20 ]: perf_comp_data(func_list, data_list)
Out[20]: function: f6, av. time sec: 0.00583, relative: 1.0
function: f5, av. time sec: 0.02711, relative: 4.6
function: f4, av. time sec: 0.06331, relative: 10.9
function: f2, av. time sec: 0.46864, relative: 80.3
function: f1, av. time sec: 0.59660, relative: 102.3
function: f3, av. time sec: 15.15156, relative: 2597.2