Python for Finance: Analyze Big Financial Data

(Elle) #1
print len(tab[:][‘No3’])
Out[118]: 2000000
CPU times: user 396 ms, sys: 89 ms, total: 485 ms
Wall time: 485 ms

Figure 7-5. Histogram of data

And, of course, we have rather flexible tools to query data via typical SQL-like statements,


as in the following example (the result of which is neatly illustrated in Figure 7-6;


compare it with Figure 7-2, based on a pandas query):


In  [ 119 ]:    %%time
res = np.array([(row[‘No3’], row[‘No4’]) for row in
tab.where(‘((No3 < -0.5) | (No3 > 0.5)) \
& ((No4 < -1) | (No4 > 1))’)])[:: 100 ]
Out[119]: CPU times: user 530 ms, sys: 52 ms, total: 582 ms
Wall time: 469 ms
In [ 120 ]: plt.plot(res.T[ 0 ], res.T[ 1 ], ‘ro’)
plt.grid(True)

Figure 7-6. Scatter plot of query result

FAST COMPLEX QUERIES

Both pandas and PyTables are able to process complex, SQL-like queries and selections. They are both optimized

for speed when it comes to such operations.

As the following examples show, working with data stored in PyTables as a Table object


makes you feel like you are working with NumPy and in-memory, both from a syntax and a


performance point of view:


In  [ 121 ]:    %%time
values = tab.cols.No3[:]
Free download pdf