Python for Finance: Analyze Big Financial Data

(Elle) #1
                                                “No3”:  Float64Col(shape=(),    dflt=0.0,   pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)

Being an order of magnitude faster than the previous approach, this approach achieves the


same result and also needs less code:


In  [ 112 ]:    h5
Out[112]: File(filename=/flash/data/tab.h5, title=u”, mode=‘w’, root_uep=’/’,
filters=Filters(complevel=0, shuffle=False, fletcher32=False,
least_significant_digit=None))
/ (RootGroup) u”
/ints_floats (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)
/ints_floats_from_array (Table(2000000,)) ‘Integers and Floats’
description := {
“Date”: StringCol(itemsize=26, shape=(), dflt=”, pos=0),
“No1”: Int32Col(shape=(), dflt=0, pos=1),
“No2”: Int32Col(shape=(), dflt=0, pos=2),
“No3”: Float64Col(shape=(), dflt=0.0, pos=3),
“No4”: Float64Col(shape=(), dflt=0.0, pos=4)}
byteorder := ‘little’
chunkshape := (2621,)

We can now delete the duplicate table, since it is no longer needed:


In  [ 113 ]:    h5.remove_node(‘/’, ‘ints_floats_from_array’)

The Table object behaves like typical Python and NumPy objects when it comes to slicing,


for example:


In  [ 114 ]:    tab[: 3 ]
Out[114]: array([(‘2014-09-28 15:17:57.631234’, 4342, 1672, -0.9293, 0.06343),
(‘2014-09-28 15:17:57.631368’, 3839, 1563, -2.02808, 0.3964),
(‘2014-09-28 15:17:57.631383’, 5100, 1326, 0.03401, 0.46742)],
dtype=[(‘Date’, ‘S26’), (‘No1’, ‘<i4’), (‘No2’, ‘<i4’), (‘No3’,
‘<f8’), (‘No4’, ‘<f8’)])

Similarly, we can select single columns only:


In  [ 115 ]:    tab[: 4 ][‘No4’]
Out[115]: array([ 0.06343, 0.3964 , 0.46742, -0.56959])

Even more convenient and important: we can apply NumPy universal functions to tables or


subsets of the table:


In  [ 116 ]:    %time np.sum(tab[:][‘No3’])
Out[116]: CPU times: user 31 ms, sys: 58 ms, total: 89 ms
Wall time: 88.3 ms

-115.34513999999896

In  [ 117 ]:    %time np.sum(np.sqrt(tab[:][‘No1’]))
Out[117]: CPU times: user 53 ms, sys: 48 ms, total: 101 ms
Wall time: 101 ms

                                        133360523.08794475

When it comes to plotting, the Table object also behaves very similarly to an ndarray


object (cf. Figure 7-5):


In  [ 118 ]:    %%time
plt.hist(tab[:][‘No3’], bins= 30 )
plt.grid(True)
Free download pdf