print len(tab[:][‘No3’])
Out[118]: 2000000
CPU times: user 396 ms, sys: 89 ms, total: 485 ms
Wall time: 485 ms
Figure 7-5. Histogram of data
And, of course, we have rather flexible tools to query data via typical SQL-like statements,
as in the following example (the result of which is neatly illustrated in Figure 7-6;
compare it with Figure 7-2, based on a pandas query):
In [ 119 ]: %%time
res = np.array([(row[‘No3’], row[‘No4’]) for row in
tab.where(‘((No3 < -0.5) | (No3 > 0.5)) \
& ((No4 < -1) | (No4 > 1))’)])[:: 100 ]
Out[119]: CPU times: user 530 ms, sys: 52 ms, total: 582 ms
Wall time: 469 ms
In [ 120 ]: plt.plot(res.T[ 0 ], res.T[ 1 ], ‘ro’)
plt.grid(True)
Figure 7-6. Scatter plot of query result
FAST COMPLEX QUERIES
Both pandas and PyTables are able to process complex, SQL-like queries and selections. They are both optimized
for speed when it comes to such operations.
As the following examples show, working with data stored in PyTables as a Table object
makes you feel like you are working with NumPy and in-memory, both from a syntax and a
performance point of view:
In [ 121 ]: %%time
values = tab.cols.No3[:]