Python for Finance: Analyze Big Financial Data

print len(tab[:][‘No3’]) Out[118]: 2000000 CPU times: user 396 ms, sys: 89 ms, total: 485 ms Wall time: 485 ms

Figure 7-5. Histogram of data

And, of course, we have rather flexible tools to query data via typical SQL-like statements,

as in the following example (the result of which is neatly illustrated in Figure 7-6;

compare it with Figure 7-2, based on a pandas query):

In [ 119 ]: %%time res = np.array([(row[‘No3’], row[‘No4’]) for row in tab.where(‘((No3 < -0.5) | (No3 > 0.5)) \ & ((No4 < -1) | (No4 > 1))’)])[:: 100 ] Out[119]: CPU times: user 530 ms, sys: 52 ms, total: 582 ms Wall time: 469 ms In [ 120 ]: plt.plot(res.T[ 0 ], res.T[ 1 ], ‘ro’) plt.grid(True)

Figure 7-6. Scatter plot of query result

FAST COMPLEX QUERIES

Both pandas and PyTables are able to process complex, SQL-like queries and selections. They are both optimized

for speed when it comes to such operations.

As the following examples show, working with data stored in PyTables as a Table object

makes you feel like you are working with NumPy and in-memory, both from a syntax and a

performance point of view:

In [ 121 ]: %%time values = tab.cols.No3[:]