Python for Finance: Analyze Big Financial Data

This illustrates what kind of overhead the spreadsheet structure brings along with it.

Reading (and plotting) the data is a faster procedure (cf. Figure 7-4):

In [ 94 ]: %time pd.read_excel(filename + ‘.xlsx’, ‘Sheet1’).cumsum().plot() Out[94]: CPU times: user 12.9 s, sys: 6 ms, total: 12.9 s Wall time: 12.9 s

Figure 7-4. Paths of random data from Excel file

Inspection of the generated files reveals that the DataFrame with HDFStore combination is

the most compact alternative (using compression, as described later in this chapter, further

increases the benefits). The same amount of data as a CSV file — i.e., as a text file — is

somewhat larger in size. This is one reason for the slower performance when working with

CSV files, the other being the very fact that they are “only” general text files:

In [ 95 ]: ll $path* Out[95]: -rw-r—r— 1 root 48831681 28. Sep 15:17 /flash/data/numbs.csv -rw-r—r— 1 root 54446080 28. Sep 15:16 /flash/data/numbs.db -rw-r—r— 1 root 48007368 28. Sep 15:16 /flash/data/numbs.h5s -rw-r—r— 1 root 4311424 28. Sep 15:17 /flash/data/numbs.xlsx In [ 96 ]: rm -f $path*

Python for Finance: Analyze Big Financial Data

This illustrates what kind of overhead the spreadsheet structure brings along with it.

Reading (and plotting) the data is a faster procedure (cf. Figure 7-4):

Figure 7-4. Paths of random data from Excel file

Inspection of the generated files reveals that the DataFrame with HDFStore combination is

the most compact alternative (using compression, as described later in this chapter, further

increases the benefits). The same amount of data as a CSV file — i.e., as a text file — is

somewhat larger in size. This is one reason for the slower performance when working with

CSV files, the other being the very fact that they are “only” general text files:

Get our desktop app

Company

Features

Documentation

Resources