Python for Finance: Analyze Big Financial Data

As a summary, we can state the following with regard to our dummy data set, which is

roughly 50 MB in size:

Writing the data with SQLite3 takes multiple seconds, with pandas taking much less

than a second.

Reading the data from the SQL database takes a bit more than a few seconds, with

pandas taking less than 0.1 second.

Data as CSV File

One of the most widely used formats to exchange data is the CSV format. Although it is not

really standardized, it can be processed by any platform and the vast majority of

applications concerned with data and financial analytics. The previous section shows how

to write and read data to and from CSV files step by step with standard Python

functionality (cf. Reading and Writing Text Files). pandas makes this whole procedure a

bit more convenient, the code more concise, and the execution in general faster:

In [ 91 ]: %time data.to_csv(filename + ‘.csv’) Out[91]: CPU times: user 5.55 s, sys: 137 ms, total: 5.69 s Wall time: 5.87 s

Reading the data now stored in the CSV file and plotting it is accomplished with the

read_csv function (cf. Figure 7-3 for the result):

In [ 92 ]: %%time pd.read_csv(filename + ‘.csv’)[[‘No1’, ‘No2’, ‘No3’, ‘No4’]].hist(bins= 20 ) Out[92]: CPU times: user 1.72 s, sys: 54 ms, total: 1.77 s Wall time: 1.78 s

Figure 7-3. Histogram of four data sets

Data as Excel File

Although working with Excel spreadsheets is the topic of a later chapter, we want to

briefly demonstrate how pandas can write data in Excel format and read data from Excel

spreadsheets. We restrict the data set to 100,000 rows in this case:

In [ 93 ]: %time data[: 100000 ].to_excel(filename + ‘.xlsx’) Out[93]: CPU times: user 27.5 s, sys: 131 ms, total: 27.6 s Wall time: 27.7 s

Generating the Excel spreadsheet with this small subset of the data takes quite a while.