Python for Finance: Analyze Big Financial Data

In [ 105 ]: %%time for i in range(rows): pointer[‘Date’] = dt.datetime.now() pointer[‘No1’] = ran_int[i, 0 ] pointer[‘No2’] = ran_int[i, 1 ] pointer[‘No3’] = ran_flo[i, 0 ] pointer[‘No4’] = ran_flo[i, 1 ] pointer.append() # this appends the data and # moves the pointer one row forward tab.flush() Out[105]: CPU times: user 15.7 s, sys: 3.53 s, total: 19.2 s Wall time: 19.4 s

Always remember to commit your changes. What the commit method is for the SQLite3

database, the flush method is for PyTables. We can now inspect the data on disk, first

logically via our Table object and second physically via the file information:

In [ 106 ]: tab Out[106]: /ints_floats (Table(2000000,)) ‘Integers and Floats’ description := { “Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0), “No1”: Int32Col(shape=(), dflt=0, pos=1), “No2”: Int32Col(shape=(), dflt=0, pos=2), “No3”: Float64Col(shape=(), dflt=0.0, pos=3), “No4”: Float64Col(shape=(), dflt=0.0, pos=4)} byteorder := ‘little’ chunkshape := (2621,) In [ 107 ]: ll $path* Out[107]: -rw-r—r— 1 root 100156256 28. Sep 15:18 /flash/data/tab.h5

There is a more performant and Pythonic way to accomplish the same result, by the use of

NumPy structured arrays:

In [ 108 ]: dty = np.dtype([(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’, ‘<f8’), (‘No4’, ‘<f8’)]) sarray = np.zeros(len(ran_int), dtype=dty) In [ 109 ]: sarray Out[109]: array([(”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0), ..., (”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0), (”, 0, 0, 0.0, 0.0)], dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’, ‘<f8’), (‘No4’, ‘<f8’)]) In [ 110 ]: %%time sarray[‘Date’] = dt.datetime.now() sarray[‘No1’] = ran_int[:, 0 ] sarray[‘No2’] = ran_int[:, 1 ] sarray[‘No3’] = ran_flo[:, 0 ] sarray[‘No4’] = ran_flo[:, 1 ] Out[110]: CPU times: user 113 ms, sys: 18 ms, total: 131 ms Wall time: 131 ms

Equipped with the complete data set now stored in the structured array, the creation of the

table boils down to the following line of code. Note that the row description is not needed

anymore; PyTables uses the NumPy dtype instead:

In [ 111 ]: %%time h5.create_table(‘/’, ‘ints_floats_from_array’, sarray, title=‘Integers and Floats’, expectedrows=rows, filters=filters) Out[111]: CPU times: user 38 ms, sys: 117 ms, total: 155 ms Wall time: 154 ms

/ints_floats_from_array (Table(2000000,)) ‘Integers and Floats’ description := { “Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0), “No1”: Int32Col(shape=(), dflt=0, pos=1), “No2”: Int32Col(shape=(), dflt=0, pos=2),

Python for Finance: Analyze Big Financial Data

Always remember to commit your changes. What the commit method is for the SQLite3

database, the flush method is for PyTables. We can now inspect the data on disk, first

logically via our Table object and second physically via the file information:

There is a more performant and Pythonic way to accomplish the same result, by the use of

NumPy structured arrays:

Equipped with the complete data set now stored in the structured array, the creation of the

table boils down to the following line of code. Note that the row description is not needed

anymore; PyTables uses the NumPy dtype instead:

Get our desktop app

Company

Features

Documentation

Resources