Python for Finance: Analyze Big Financial Data

(Elle) #1

PyTables supports out-of-memory operations, which makes it possible to implement


array-based computations that do not fit into the memory:


In  [ 137 ]:    filename    =   path    +   ‘array.h5’
h5 = tb.open_file(filename, ‘w’)

We create an EArray object that is extendable in the first dimension and has a fixed width


of 1,000 in the second dimension:


In  [ 138 ]:    n   =    1000
ear = h5.createEArray(h5.root, ‘ear’,
atom=tb.Float64Atom(),
shape=( 0 , n))

Since it is extendable, such an object can be populated chunk-wise:


In  [ 139 ]:    %%time
rand = np.random.standard_normal((n, n))
for i in range( 750 ):
ear.append(rand)
ear.flush()
Out[139]: CPU times: user 2.42 s, sys: 7.29 s, total: 9.71 s
Wall time: 20.6 s

To check how much data we have generated logically and physically, we can inspect the


meta-information provided for the object as well as the disk space consumption:


In  [ 140 ]:    ear
Out[140]: /ear (EArray(750000, 1000)) ”
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := ‘numpy’
byteorder := ‘little’
chunkshape := (8, 1000)
In [ 141 ]: ear.size_on_disk
Out[141]: 6000000000L

The EArray object is 6 GB large. For an out-of-memory computation, we need a target


EArray object in the database:


In  [ 142 ]:    out =   h5.createEArray(h5.root,    ‘out’,
atom=tb.Float64Atom(),
shape=( 0 , n))

PyTables has a special module to cope with numerical expressions efficiently. It is called


Expr and is based on the numerical expression library numexpr. This is what we want to


use to calculate the mathematical expression in Equation 7-1 on the whole EArray object


that we generated before.


Equation 7-1. Example mathematical expression


The following code shows the capabilities for out-of-memory calculations in action:


In  [ 143 ]:    expr    =   tb.Expr(‘3  *   sin(ear)    +   sqrt(abs(ear))’)
# the numerical expression as a string object
expr.setOutput(out, append_mode=True)
# target to store results is disk-based array
In [ 144 ]: %time expr.eval()
# evaluation of the numerical expression
# and storage of results in disk-based array
Out[144]: CPU times: user 34.4 s, sys: 11.6 s, total: 45.9 s
Wall time: 1min 41s
Free download pdf