Python for Finance: Analyze Big Financial Data

(Elle) #1
In  [ 105 ]:    %%time
for i in range(rows):
pointer[‘Date’] = dt.datetime.now()
pointer[‘No1’] = ran_int[i, 0 ]
pointer[‘No2’] = ran_int[i, 1 ]
pointer[‘No3’] = ran_flo[i, 0 ]
pointer[‘No4’] = ran_flo[i, 1 ]
pointer.append()
# this appends the data and
# moves the pointer one row forward
tab.flush()
Out[105]: CPU times: user 15.7 s, sys: 3.53 s, total: 19.2 s
Wall time: 19.4 s

Always remember to commit your changes. What the commit method is for the SQLite3


database, the flush method is for PyTables. We can now inspect the data on disk, first


logically via our Table object and second physically via the file information:


In  [ 106 ]:    tab
Out[106]: /ints_floats (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
In [ 107 ]: ll $path*
Out[107]: -rw-r—r— 1 root 100156256 28. Sep 15:18 /flash/data/tab.h5

There is a more performant and Pythonic way to accomplish the same result, by the use of


NumPy structured arrays:


In  [ 108 ]:    dty =   np.dtype([(‘Date’,  ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’),
(‘No3’, ‘<f8’), (‘No4’, ‘<f8’)])
sarray = np.zeros(len(ran_int), dtype=dty)
In [ 109 ]: sarray
Out[109]: array([(”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0),
(”, 0, 0, 0.0, 0.0),
..., (”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0),
(”, 0, 0, 0.0, 0.0)],
dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’,
‘<f8’), (‘No4’, ‘<f8’)])
In [ 110 ]: %%time
sarray[‘Date’] = dt.datetime.now()
sarray[‘No1’] = ran_int[:, 0 ]
sarray[‘No2’] = ran_int[:, 1 ]
sarray[‘No3’] = ran_flo[:, 0 ]
sarray[‘No4’] = ran_flo[:, 1 ]
Out[110]: CPU times: user 113 ms, sys: 18 ms, total: 131 ms
Wall time: 131 ms

Equipped with the complete data set now stored in the structured array, the creation of the


table boils down to the following line of code. Note that the row description is not needed


anymore; PyTables uses the NumPy dtype instead:


In  [ 111 ]:    %%time
h5.create_table(‘/’, ‘ints_floats_from_array’, sarray,
title=‘Integers and Floats’,
expectedrows=rows, filters=filters)
Out[111]: CPU times: user 38 ms, sys: 117 ms, total: 155 ms
Wall time: 154 ms

                                        /ints_floats_from_array (Table(2000000,))   ‘Integers   and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
Free download pdf