“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
Being an order of magnitude faster than the previous approach, this approach achieves the
same result and also needs less code:
In [ 112 ]: h5
Out[112]: File(filename=/flash/data/tab.h5, title=u”, mode=‘w’, root_uep=’/’,
filters=Filters(complevel=0, shuffle=False, fletcher32=False,
least_significant_digit=None))
/ (RootGroup) u”
/ints_floats (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
/ints_floats_from_array (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
We can now delete the duplicate table, since it is no longer needed:
In [ 113 ]: h5.remove_node(‘/’, ‘ints_floats_from_array’)
The Table object behaves like typical Python and NumPy objects when it comes to slicing,
for example:
In [ 114 ]: tab[: 3 ]
Out[114]: array([(‘2014-09-28 15:17:57.631234’, 4342, 1672, -0.9293, 0.06343),
(‘2014-09-28 15:17:57.631368’, 3839, 1563, -2.02808, 0.3964),
(‘2014-09-28 15:17:57.631383’, 5100, 1326, 0.03401, 0.46742)],
dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’,
‘<f8’), (‘No4’, ‘<f8’)])
Similarly, we can select single columns only:
In [ 115 ]: tab[: 4 ][‘No4’]
Out[115]: array([ 0.06343, 0.3964 , 0.46742, -0.56959])
Even more convenient and important: we can apply NumPy universal functions to tables or
subsets of the table:
In [ 116 ]: %time np.sum(tab[:][‘No3’])
Out[116]: CPU times: user 31 ms, sys: 58 ms, total: 89 ms
Wall time: 88.3 ms
-115.34513999999896
In [ 117 ]: %time np.sum(np.sqrt(tab[:][‘No1’]))
Out[117]: CPU times: user 53 ms, sys: 48 ms, total: 101 ms
Wall time: 101 ms
133360523.08794475
When it comes to plotting, the Table object also behaves very similarly to an ndarray
object (cf. Figure 7-5):
In [ 118 ]: %%time
plt.hist(tab[:][‘No3’], bins= 30 )
plt.grid(True)