Python for Finance: Analyze Big Financial Data

beforehand what is stored in a pickle file. A sometimes helpful workaround is to not store

single objects, but a dict object containing all the other objects:

In [ 25 ]: pkl_file = open(path + ‘data.pkl’, ‘w’) # open file for writing pickle.dump({‘x’ : x, ‘y’ : y}, pkl_file) pkl_file.close()

Using this approach allows us to read the whole set of objects at once and, for example, to

iterate over the dict object’s key values:

In [ 26 ]: pkl_file = open(path + ‘data.pkl’, ‘r’) # open file for writing data = pickle.load(pkl_file) pkl_file.close() for key in data.keys(): print key, data[key][: 4 ] Out[26]: y [ 13.29275485 2.14256725 6.49394423 0.96476785] x [-3.64592304 1.46375109 2.54832185 0.98222597] In [ 27 ]: !rm -f $path*

This approach, however, requires us to write and read all objects at once. This is a

compromise one can probably live with in many circumstances given the much higher

convenience it brings along.

Reading and Writing Text Files

Text processing can be considered a strength of Python. In fact, many corporate and

scientific users use Python for exactly this task. With Python you have a multitude of

options to work with string objects, as well as with text files in general.

Suppose we have generated quite a large set of data that we want to save and share as a

comma-separated value (CSV) file. Although they have a special structure, such files are

basically plain text files:

In [ 28 ]: rows = 5000 a = np.random.standard_normal((rows, 5 )) # dummy data In [ 29 ]: a.round( 4 ) Out[29]: array([[ 1.381 , -1.1236, 1.0622, -1.3997, -0.7374], [ 0.15 , 0.967 , 1.8391, 0.5633, 0.0569], [-0.9504, 0.4779, 1.8636, -1.9152, -0.3005], ..., [ 0.8843, -1.3932, -0.0506, 0.2717, -1.4921], [-1.0352, 1.0368, 0.4562, -0.0667, -1.3391], [ 0.9952, -0.6398, 0.8467, -1.6951, 1.122 ]])

To make the case a bit more realistic, we add date-time information to the mix and use the

pandas date_range function to generate a series of hourly date-time points (for details,

see Chapter 6 and Appendix C):

In [ 30 ]: import pandas as pd t = pd.date_range(start=‘2014/1/1’, periods=rows, freq=‘H’) # set of hourly datetime objects In [ 31 ]: t Out[31]: <class ‘pandas.tseries.index.DatetimeIndex’> [2014-01-01 00:00:00, ..., 2014-07-28 07:00:00] Length: 5000, Freq: H, Timezone: None

To write the data, we need to open a new file object on disk:

In [ 32 ]: csv_file = open(path + ‘data.csv’, ‘w’) # open file for writing

The first line of a CSV file generally contains the names for each data column stored in the

file, so we write this first:

In [ 33 ]: header = ‘date,no1,no2,no3,no4,no5\n’ csv_file.write(header)