2.5483218463404067,
0.9822259685028746,
3.594915396586916]
To ensure that objects a and b are indeed the same, NumPy provides the function allclose:
In [ 14 ]: np.allclose(np.array(a), np.array(b))
Out[14]: True
In principle, this is the same as calculating the difference of two ndarray objects and
checking whether it is 0:
In [ 15 ]: np.sum(np.array(a) - np.array(b))
Out[15]: 0.0
However, allclose takes as a parameter a tolerance level, which by default is set to 1e-5.
Storing and retrieving a single object with pickle obviously is quite simple. What about
two objects?
In [ 16 ]: pkl_file = open(path + ‘data.pkl’, ‘w’) # open file for writing
In [ 17 ]: %time pickle.dump(np.array(a), pkl_file)
Out[17]: CPU times: user 799 ms, sys: 47 ms, total: 846 ms
Wall time: 846 ms
In [ 18 ]: %time pickle.dump(np.array(a) ** 2 , pkl_file)
Out[18]: CPU times: user 742 ms, sys: 41 ms, total: 783 ms
Wall time: 784 ms
In [ 19 ]: pkl_file.close()
In [ 20 ]: ll $path*
Out[20]: -rw-r—r— 1 root 44098737 28. Sep 15:16 /flash/data/data.pkl
What has happened? Mainly the following:
We have written an ndarray version of the original object to disk.
We have also written a squared ndarray version to disk, into the same file.
Both operations were faster than the original operation (due to the use of ndarray
objects).
The file is approximately double the size as before, since we have stored double the
amount of data.
Let us read the two ndarray objects back into memory:
In [ 21 ]: pkl_file = open(path + ‘data.pkl’, ‘r’) # open file for reading
pickle.load does the job. However, notice that it only returns a single ndarray object:
In [ 22 ]: x = pickle.load(pkl_file)
x
Out[22]: array([-3.64592304, 1.46375109, 2.54832185, ..., 2.87048515,
0.66186994, -1.38532837])
Calling pickle.load for the second time returns the second object:
In [ 23 ]: y = pickle.load(pkl_file)
y
Out[23]: array([ 13.29275485, 2.14256725, 6.49394423, ..., 8.23968501,
0.43807181, 1.9191347 ])
In [ 24 ]: pkl_file.close()
Obviously, pickle stores objects according to the first in, first out (FIFO) principle. There
is one major problem with this: there is no meta-information available to the user to know