Python for Finance: Analyze Big Financial Data

(Elle) #1
2.5483218463404067,

0.9822259685028746,

3.594915396586916]

To ensure that objects a and b are indeed the same, NumPy provides the function allclose:


In  [ 14 ]: np.allclose(np.array(a),    np.array(b))
Out[14]: True

In principle, this is the same as calculating the difference of two ndarray objects and


checking whether it is 0:


In  [ 15 ]: np.sum(np.array(a)  -   np.array(b))
Out[15]: 0.0

However, allclose takes as a parameter a tolerance level, which by default is set to 1e-5.


Storing and retrieving a single object with pickle obviously is quite simple. What about


two objects?


In  [ 16 ]: pkl_file    =   open(path   +   ‘data.pkl’, ‘w’)        #   open    file    for writing
In [ 17 ]: %time pickle.dump(np.array(a), pkl_file)
Out[17]: CPU times: user 799 ms, sys: 47 ms, total: 846 ms
Wall time: 846 ms
In [ 18 ]: %time pickle.dump(np.array(a) ** 2 , pkl_file)
Out[18]: CPU times: user 742 ms, sys: 41 ms, total: 783 ms
Wall time: 784 ms
In [ 19 ]: pkl_file.close()
In [ 20 ]: ll $path*
Out[20]: -rw-r—r— 1 root 44098737 28. Sep 15:16 /flash/data/data.pkl

What has happened? Mainly the following:


We have written an ndarray version of the original object to disk.


We have also written a squared ndarray version to disk, into the same file.


Both operations were faster than the original operation (due to the use of ndarray


objects).


The file is approximately double the size as before, since we have stored double the


amount of data.


Let us read the two ndarray objects back into memory:


In  [ 21 ]: pkl_file    =   open(path   +   ‘data.pkl’, ‘r’)        #   open    file    for reading

pickle.load does the job. However, notice that it only returns a single ndarray object:


In  [ 22 ]: x   =   pickle.load(pkl_file)
x
Out[22]: array([-3.64592304, 1.46375109, 2.54832185, ..., 2.87048515,
0.66186994, -1.38532837])

Calling pickle.load for the second time returns the second object:


In  [ 23 ]: y   =   pickle.load(pkl_file)
y
Out[23]: array([ 13.29275485, 2.14256725, 6.49394423, ..., 8.23968501,
0.43807181, 1.9191347 ])
In [ 24 ]: pkl_file.close()

Obviously, pickle stores objects according to the first in, first out (FIFO) principle. There


is one major problem with this: there is no meta-information available to the user to know

Free download pdf