beforehand what is stored in a pickle file. A sometimes helpful workaround is to not store
single objects, but a dict object containing all the other objects:
In [ 25 ]: pkl_file = open(path + ‘data.pkl’, ‘w’) # open file for writing
pickle.dump({‘x’ : x, ‘y’ : y}, pkl_file)
pkl_file.close()
Using this approach allows us to read the whole set of objects at once and, for example, to
iterate over the dict object’s key values:
In [ 26 ]: pkl_file = open(path + ‘data.pkl’, ‘r’) # open file for writing
data = pickle.load(pkl_file)
pkl_file.close()
for key in data.keys():
print key, data[key][: 4 ]
Out[26]: y [ 13.29275485 2.14256725 6.49394423 0.96476785]
x [-3.64592304 1.46375109 2.54832185 0.98222597]
In [ 27 ]: !rm -f $path*
This approach, however, requires us to write and read all objects at once. This is a
compromise one can probably live with in many circumstances given the much higher
convenience it brings along.
Reading and Writing Text Files
Text processing can be considered a strength of Python. In fact, many corporate and
scientific users use Python for exactly this task. With Python you have a multitude of
options to work with string objects, as well as with text files in general.
Suppose we have generated quite a large set of data that we want to save and share as a
comma-separated value (CSV) file. Although they have a special structure, such files are
basically plain text files:
In [ 28 ]: rows = 5000
a = np.random.standard_normal((rows, 5 )) # dummy data
In [ 29 ]: a.round( 4 )
Out[29]: array([[ 1.381 , -1.1236, 1.0622, -1.3997, -0.7374],
[ 0.15 , 0.967 , 1.8391, 0.5633, 0.0569],
[-0.9504, 0.4779, 1.8636, -1.9152, -0.3005],
...,
[ 0.8843, -1.3932, -0.0506, 0.2717, -1.4921],
[-1.0352, 1.0368, 0.4562, -0.0667, -1.3391],
[ 0.9952, -0.6398, 0.8467, -1.6951, 1.122 ]])
To make the case a bit more realistic, we add date-time information to the mix and use the
pandas date_range function to generate a series of hourly date-time points (for details,
see Chapter 6 and Appendix C):
In [ 30 ]: import pandas as pd
t = pd.date_range(start=‘2014/1/1’, periods=rows, freq=‘H’)
# set of hourly datetime objects
In [ 31 ]: t
Out[31]: <class ‘pandas.tseries.index.DatetimeIndex’>
[2014-01-01 00:00:00, ..., 2014-07-28 07:00:00]
Length: 5000, Freq: H, Timezone: None
To write the data, we need to open a new file object on disk:
In [ 32 ]: csv_file = open(path + ‘data.csv’, ‘w’) # open file for writing
The first line of a CSV file generally contains the names for each data column stored in the
file, so we write this first:
In [ 33 ]: header = ‘date,no1,no2,no3,no4,no5\n’
csv_file.write(header)