Python Paradigms and Performance
In finance, like in other scientific and data-intensive disciplines, numerical computations
on large data sets can be quite time-consuming. As an example, we want to evaluate a
somewhat complex mathematical expression on an array with 500,000 numbers. We
choose the expression in Equation 8-1, which leads to some computational burden per
calculation. Apart from that, it does not have any specific meaning.
Equation 8-1. Example mathematical expression
Equation 8-1 is easily translated into a Python function:
In [ 2 ]: from math import *
def f(x):
return abs(cos(x)) ** 0.5 + sin( 2 + 3 * x)
Using the range function we can generate efficiently a list object with 500,000 numbers
that we can work with:
In [ 3 ]: I = 500000
a_py = range(I)
As the first implementation, consider function f1, which loops over the whole data set and
appends the single results of the function evaluations to a results list object:
In [ 4 ]: def f1(a):
res = []
for x in a:
res.append(f(x))
return res
This is not the only way to implement this. One can also use different Python paradigms,
like iterators or the eval function, to get functions of the form f2 and f3:
In [ 5 ]: def f2(a):
return [f(x) for x in a]
In [ 6 ]: def f3(a):
ex = ‘abs(cos(x)) ** 0.5 + sin(2 + 3 * x)’
return [eval(ex) for x in a]
Of course, the same algorithm can be implemented by the use of NumPy vectorization
techniques. In this case, the array of data is an ndarray object instead of a list object.
The function implementation f4 shows no loops whatsoever; all looping takes place on the
NumPy level and not on the Python level:
In [ 7 ]: import numpy as np
In [ 8 ]: a_np = np.arange(I)
In [ 9 ]: def f4(a):
return (np.abs(np.cos(a)) ** 0.5 +
np.sin( 2 + 3 * a))
Then, we can use a specialized library called numexpr to evaluate the numerical
expression. This library has built-in support for multithreaded execution. Therefore, to
compare the performance of the single with the multithreaded approach, we define two
different functions, f5 (single thread) and f6 (multiple threads):
In [ 10 ]: import numexpr as ne