Python for Finance: Analyze Big Financial Data

(Elle) #1

Python Paradigms and Performance


In finance, like in other scientific and data-intensive disciplines, numerical computations


on large data sets can be quite time-consuming. As an example, we want to evaluate a


somewhat complex mathematical expression on an array with 500,000 numbers. We


choose the expression in Equation 8-1, which leads to some computational burden per


calculation. Apart from that, it does not have any specific meaning.


Equation 8-1. Example mathematical expression


Equation 8-1 is easily translated into a Python function:


In  [ 2 ]:  from math import    *
def f(x):
return abs(cos(x)) ** 0.5 + sin( 2 + 3 * x)

Using the range function we can generate efficiently a list object with 500,000 numbers


that we can work with:


In  [ 3 ]:  I   =    500000
a_py = range(I)

As the first implementation, consider function f1, which loops over the whole data set and


appends the single results of the function evaluations to a results list object:


In  [ 4 ]:  def f1(a):
res = []
for x in a:
res.append(f(x))
return res

This is not the only way to implement this. One can also use different Python paradigms,


like iterators or the eval function, to get functions of the form f2 and f3:


In  [ 5 ]:  def f2(a):
return [f(x) for x in a]
In [ 6 ]: def f3(a):
ex = ‘abs(cos(x)) ** 0.5 + sin(2 + 3 * x)’
return [eval(ex) for x in a]

Of course, the same algorithm can be implemented by the use of NumPy vectorization


techniques. In this case, the array of data is an ndarray object instead of a list object.


The function implementation f4 shows no loops whatsoever; all looping takes place on the


NumPy level and not on the Python level:


In  [ 7 ]:  import numpy as np
In [ 8 ]: a_np = np.arange(I)
In [ 9 ]: def f4(a):
return (np.abs(np.cos(a)) ** 0.5 +
np.sin( 2 + 3 * a))

Then, we can use a specialized library called numexpr to evaluate the numerical


expression. This library has built-in support for multithreaded execution. Therefore, to


compare the performance of the single with the multithreaded approach, we define two


different functions, f5 (single thread) and f6 (multiple threads):


In  [ 10 ]: import numexpr as ne
Free download pdf