Python for Finance: Analyze Big Financial Data

(Elle) #1

Basic I/O with Python


Python itself comes with a multitude of I/O capabilites, some optimized for performance,


others more for flexibility. In general, however, they are easily used in interactive as well


as in large-scale deployment settings.


Writing Objects to Disk


For later use, for documentation, or for sharing with others, one might want to store


Python objects on disk. One option is to use the pickle module. This module can serialize


the majority of Python objects. Serialization refers to the conversion of an object


(hierarchy) to a byte stream; deserialization is the opposite operation. In the example that


follows, we work again with (pseudo)random data, this time stored in a list object:


In  [ 1 ]:  path    =   ‘/flash/data/’
In [ 2 ]: import numpy as np
from random import gauss
In [ 3 ]: a = [gauss(1.5, 2 ) for i in range( 1000000 )]
# generation of normally distributed randoms

The task now is to write this list object to disk for later retrieval. pickle accomplishes


this task:


In  [ 4 ]:  import pickle
In [ 5 ]: pkl_file = open(path + ‘data.pkl’, ‘w’)
# open file for writing
# Note: existing file might be overwritten

The two major functions we need are dump, for writing objects, and load, for loading them


into the memory:


In  [ 6 ]:  %time pickle.dump(a,    pkl_file)
Out[6]: CPU times: user 4.3 s, sys: 43 ms, total: 4.35 s
Wall time: 4.36 s
In [ 7 ]: pkl_file
Out[7]: <open file ‘/flash/data/data.pkl’, mode ‘w’ at 0x3df0540>
In [ 8 ]: pkl_file.close()

We can now inspect the size of the file on disk. The list object with 1,000,000 floats


takes about 20 megabytes (MB) of disk space:


In  [ 9 ]:  ll $path*
Out[9]: -rw-r—r— 1 root 20970325 28. Sep 15:16 /flash/data/data.pkl

Now that we have data on disk, we can read it into memory via pickle.load:


In  [ 10 ]: pkl_file    =   open(path   +   ‘data.pkl’, ‘r’)        #   open    file    for reading
In [ 11 ]: %time b = pickle.load(pkl_file)
Out[11]: CPU times: user 3.37 s, sys: 18 ms, total: 3.38 s
Wall time: 3.39 s
In [ 12 ]: b[: 5 ]
Out[12]: [-3.6459230447943165,
1.4637510875573307,
2.5483218463404067,
0.9822259685028746,
3.594915396586916]

Let us compare this with the first five floats of the original object:


In  [ 13 ]: a[: 5 ]
Out[13]: [-3.6459230447943165,
1.4637510875573307,
Free download pdf