Python for Finance: Analyze Big Financial Data

“No3”: Float64Col(shape=(), dflt=0.0, pos=3), “No4”: Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := ‘little’ chunkshape := (2621,)

Being an order of magnitude faster than the previous approach, this approach achieves the

same result and also needs less code:

In [ 112 ]: h5 Out[112]: File(filename=/flash/data/tab.h5, title=u”, mode=‘w’, root_uep=’/’, filters=Filters(complevel=0, shuffle=False, fletcher32=False, least_significant_digit=None)) / (RootGroup) u” /ints_floats (Table(2000000,)) ‘Integers and Floats’ description := { “Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0), “No1”: Int32Col(shape=(), dflt=0, pos=1), “No2”: Int32Col(shape=(), dflt=0, pos=2), “No3”: Float64Col(shape=(), dflt=0.0, pos=3), “No4”: Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := ‘little’ chunkshape := (2621,) /ints_floats_from_array (Table(2000000,)) ‘Integers and Floats’ description := { “Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0), “No1”: Int32Col(shape=(), dflt=0, pos=1), “No2”: Int32Col(shape=(), dflt=0, pos=2), “No3”: Float64Col(shape=(), dflt=0.0, pos=3), “No4”: Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := ‘little’ chunkshape := (2621,)

We can now delete the duplicate table, since it is no longer needed:

In [ 113 ]: h5.remove_node(‘/’, ‘ints_floats_from_array’)

The Table object behaves like typical Python and NumPy objects when it comes to slicing,

for example:

In [ 114 ]: tab[: 3 ] Out[114]: array([(‘2014-09-28 15:17:57.631234’, 4342, 1672, -0.9293, 0.06343), (‘2014-09-28 15:17:57.631368’, 3839, 1563, -2.02808, 0.3964), (‘2014-09-28 15:17:57.631383’, 5100, 1326, 0.03401, 0.46742)], dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’, ‘<f8’), (‘No4’, ‘<f8’)])

Similarly, we can select single columns only:

In [ 115 ]: tab[: 4 ][‘No4’] Out[115]: array([ 0.06343, 0.3964 , 0.46742, -0.56959])

Even more convenient and important: we can apply NumPy universal functions to tables or

subsets of the table:

In [ 116 ]: %time np.sum(tab[:][‘No3’]) Out[116]: CPU times: user 31 ms, sys: 58 ms, total: 89 ms Wall time: 88.3 ms

-115.34513999999896

In [ 117 ]: %time np.sum(np.sqrt(tab[:][‘No1’])) Out[117]: CPU times: user 53 ms, sys: 48 ms, total: 101 ms Wall time: 101 ms

133360523.08794475

When it comes to plotting, the Table object also behaves very similarly to an ndarray

object (cf. Figure 7-5):

In [ 118 ]: %%time plt.hist(tab[:][‘No3’], bins= 30 ) plt.grid(True)