Basic I/O with Python
Python itself comes with a multitude of I/O capabilites, some optimized for performance,
others more for flexibility. In general, however, they are easily used in interactive as well
as in large-scale deployment settings.
Writing Objects to Disk
For later use, for documentation, or for sharing with others, one might want to store
Python objects on disk. One option is to use the pickle module. This module can serialize
the majority of Python objects. Serialization refers to the conversion of an object
(hierarchy) to a byte stream; deserialization is the opposite operation. In the example that
follows, we work again with (pseudo)random data, this time stored in a list object:
In [ 1 ]: path = ‘/flash/data/’
In [ 2 ]: import numpy as np
from random import gauss
In [ 3 ]: a = [gauss(1.5, 2 ) for i in range( 1000000 )]
# generation of normally distributed randoms
The task now is to write this list object to disk for later retrieval. pickle accomplishes
this task:
In [ 4 ]: import pickle
In [ 5 ]: pkl_file = open(path + ‘data.pkl’, ‘w’)
# open file for writing
# Note: existing file might be overwritten
The two major functions we need are dump, for writing objects, and load, for loading them
into the memory:
In [ 6 ]: %time pickle.dump(a, pkl_file)
Out[6]: CPU times: user 4.3 s, sys: 43 ms, total: 4.35 s
Wall time: 4.36 s
In [ 7 ]: pkl_file
Out[7]: <open file ‘/flash/data/data.pkl’, mode ‘w’ at 0x3df0540>
In [ 8 ]: pkl_file.close()
We can now inspect the size of the file on disk. The list object with 1,000,000 floats
takes about 20 megabytes (MB) of disk space:
In [ 9 ]: ll $path*
Out[9]: -rw-r—r— 1 root 20970325 28. Sep 15:16 /flash/data/data.pkl
Now that we have data on disk, we can read it into memory via pickle.load:
In [ 10 ]: pkl_file = open(path + ‘data.pkl’, ‘r’) # open file for reading
In [ 11 ]: %time b = pickle.load(pkl_file)
Out[11]: CPU times: user 3.37 s, sys: 18 ms, total: 3.38 s
Wall time: 3.39 s
In [ 12 ]: b[: 5 ]
Out[12]: [-3.6459230447943165,
1.4637510875573307,
2.5483218463404067,
0.9822259685028746,
3.594915396586916]
Let us compare this with the first five floats of the original object:
In [ 13 ]: a[: 5 ]
Out[13]: [-3.6459230447943165,
1.4637510875573307,