In [ 105 ]: %%time
for i in range(rows):
pointer[‘Date’] = dt.datetime.now()
pointer[‘No1’] = ran_int[i, 0 ]
pointer[‘No2’] = ran_int[i, 1 ]
pointer[‘No3’] = ran_flo[i, 0 ]
pointer[‘No4’] = ran_flo[i, 1 ]
pointer.append()
# this appends the data and
# moves the pointer one row forward
tab.flush()
Out[105]: CPU times: user 15.7 s, sys: 3.53 s, total: 19.2 s
Wall time: 19.4 s
Always remember to commit your changes. What the commit method is for the SQLite3
database, the flush method is for PyTables. We can now inspect the data on disk, first
logically via our Table object and second physically via the file information:
In [ 106 ]: tab
Out[106]: /ints_floats (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
In [ 107 ]: ll $path*
Out[107]: -rw-r—r— 1 root 100156256 28. Sep 15:18 /flash/data/tab.h5
There is a more performant and Pythonic way to accomplish the same result, by the use of
NumPy structured arrays:
In [ 108 ]: dty = np.dtype([(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’),
(‘No3’, ‘<f8’), (‘No4’, ‘<f8’)])
sarray = np.zeros(len(ran_int), dtype=dty)
In [ 109 ]: sarray
Out[109]: array([(”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0),
(”, 0, 0, 0.0, 0.0),
..., (”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0),
(”, 0, 0, 0.0, 0.0)],
dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’,
‘<f8’), (‘No4’, ‘<f8’)])
In [ 110 ]: %%time
sarray[‘Date’] = dt.datetime.now()
sarray[‘No1’] = ran_int[:, 0 ]
sarray[‘No2’] = ran_int[:, 1 ]
sarray[‘No3’] = ran_flo[:, 0 ]
sarray[‘No4’] = ran_flo[:, 1 ]
Out[110]: CPU times: user 113 ms, sys: 18 ms, total: 131 ms
Wall time: 131 ms
Equipped with the complete data set now stored in the structured array, the creation of the
table boils down to the following line of code. Note that the row description is not needed
anymore; PyTables uses the NumPy dtype instead:
In [ 111 ]: %%time
h5.create_table(‘/’, ‘ints_floats_from_array’, sarray,
title=‘Integers and Floats’,
expectedrows=rows, filters=filters)
Out[111]: CPU times: user 38 ms, sys: 117 ms, total: 155 ms
Wall time: 154 ms
/ints_floats_from_array (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),