Python for Finance: Analyze Big Financial Data

(Elle) #1

beforehand what is stored in a pickle file. A sometimes helpful workaround is to not store


single objects, but a dict object containing all the other objects:


In  [ 25 ]: pkl_file    =   open(path   +   ‘data.pkl’, ‘w’)        #   open    file    for writing
pickle.dump({‘x’ : x, ‘y’ : y}, pkl_file)
pkl_file.close()

Using this approach allows us to read the whole set of objects at once and, for example, to


iterate over the dict object’s key values:


In  [ 26 ]: pkl_file    =   open(path   +   ‘data.pkl’, ‘r’)        #   open    file    for writing
data = pickle.load(pkl_file)
pkl_file.close()
for key in data.keys():
print key, data[key][: 4 ]
Out[26]: y [ 13.29275485 2.14256725 6.49394423 0.96476785]
x [-3.64592304 1.46375109 2.54832185 0.98222597]
In [ 27 ]: !rm -f $path*

This approach, however, requires us to write and read all objects at once. This is a


compromise one can probably live with in many circumstances given the much higher


convenience it brings along.


Reading and Writing Text Files


Text processing can be considered a strength of Python. In fact, many corporate and


scientific users use Python for exactly this task. With Python you have a multitude of


options to work with string objects, as well as with text files in general.


Suppose we have generated quite a large set of data that we want to save and share as a


comma-separated value (CSV) file. Although they have a special structure, such files are


basically plain text files:


In  [ 28 ]: rows    =    5000
a = np.random.standard_normal((rows, 5 )) # dummy data
In [ 29 ]: a.round( 4 )
Out[29]: array([[ 1.381 , -1.1236, 1.0622, -1.3997, -0.7374],
[ 0.15 , 0.967 , 1.8391, 0.5633, 0.0569],
[-0.9504, 0.4779, 1.8636, -1.9152, -0.3005],
...,
[ 0.8843, -1.3932, -0.0506, 0.2717, -1.4921],
[-1.0352, 1.0368, 0.4562, -0.0667, -1.3391],
[ 0.9952, -0.6398, 0.8467, -1.6951, 1.122 ]])

To make the case a bit more realistic, we add date-time information to the mix and use the


pandas date_range function to generate a series of hourly date-time points (for details,


see Chapter 6 and Appendix C):


In  [ 30 ]: import pandas as pd
t = pd.date_range(start=‘2014/1/1’, periods=rows, freq=‘H’)
# set of hourly datetime objects
In [ 31 ]: t
Out[31]: <class ‘pandas.tseries.index.DatetimeIndex’>
[2014-01-01 00:00:00, ..., 2014-07-28 07:00:00]
Length: 5000, Freq: H, Timezone: None

To write the data, we need to open a new file object on disk:


In  [ 32 ]: csv_file    =   open(path   +   ‘data.csv’, ‘w’)        #   open    file    for writing

The first line of a CSV file generally contains the names for each data column stored in the


file, so we write this first:


In  [ 33 ]: header  =   ‘date,no1,no2,no3,no4,no5\n’
csv_file.write(header)
Free download pdf