Python for Finance: Analyze Big Financial Data

(Elle) #1
In  [ 65 ]: lines[: 6 ]
Out[65]: [‘PriceIndices-EUROCurrency\n’,
‘Date;Blue-Chip;Blue-Chip;Broad;Broad;ExUK;ExEuroZone;Blue-Chip;Broad\
n’,
‘;Europe;Euro-Zone;Europe;Euro-Zone;;;Nordic;Nordic\n’,
‘;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF\n’,
‘31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’,
‘01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’]

The above-mentioned format change can be seen between lines 3,883 and 3,990 of the


file. From December 27, there suddenly appears an additional semicolon at the end of


each data row:


In  [ 66 ]: for line in lines[ 3883 : 3890 ]:
print line[ 41 :],
Out[66]: 317.10;267.23;5268.36;363.19
322.55;272.18;5360.52;370.94
322.69;272.95;5360.52;370.94
327.57;277.68;5479.59;378.69;
329.94;278.87;5585.35;386.99;
326.77;272.38;5522.25;380.09;
332.62;277.08;5722.57;396.12;

To make the data set easier to import, we do the following:


1 . Generate a new text file.


2 . Delete unneeded header lines.


3 . Write an appropriate new header line to the new file.


4 . Add a helper column, DEL (to catch the trailing semicolons).


5 . Write all data rows to the new file.


With these adjustments, the data set can be imported and the helper column deleted after


the import. But first, the cleaning code:


In  [ 67 ]: new_file    =   open(‘./data/es50.txt’, ‘w’)
# opens a new file
new_file.writelines(‘date’ + lines[ 3 ][:- 1 ]
+ ‘;DEL’ + lines[ 3 ][- 1 ])
# writes the corrected third line of the original file
# as first line of new file
new_file.writelines(lines[ 4 :])
# writes the remaining lines of the orignial file
new_file.close()

Let us see how the new header looks:


In  [ 68 ]: new_lines   =   open(‘./data/es50.txt’, ‘r’).readlines()
new_lines[: 5 ]
Out[68]: [‘date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n’,
‘31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’,
‘01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’,
‘02.01.1987;770.89;891.78;82.57;97.80;97.43;69.37;647.62;65.81\n’,
‘05.01.1987;771.89;898.33;82.82;98.60;98.19;69.16;649.94;65.82\n’]

It looks appropriate for the import with the read_csv function of pandas, so we continue:


In  [ 69 ]: es  =   pd.read_csv(‘./data/es50.txt’,  index_col= 0 ,
parse_dates=True, sep=‘;’, dayfirst=True)
In [ 70 ]: np.round(es.tail())
Out[70]: SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF DEL
date
2014-09-22 3096 3257 347 326 403 357 9703 565 NaN
2014-09-23 3058 3206 342 321 398 353 9602 558 NaN
2014-09-24 3086 3244 344 323 401 355 9629 560 NaN
2014-09-25 3059 3202 341 320 397 353 9538 556 NaN
2014-09-26 3064 3220 342 321 398 353 9559 557 NaN
Free download pdf