Python for Finance: Analyze Big Financial Data

(Elle) #1

As a summary, we can state the following with regard to our dummy data set, which is


roughly 50 MB in size:


Writing the data with SQLite3 takes multiple seconds, with pandas taking much less


than a second.


Reading the data from the SQL database takes a bit more than a few seconds, with


pandas taking less than 0.1 second.


Data as CSV File


One of the most widely used formats to exchange data is the CSV format. Although it is not


really standardized, it can be processed by any platform and the vast majority of


applications concerned with data and financial analytics. The previous section shows how


to write and read data to and from CSV files step by step with standard Python


functionality (cf. Reading and Writing Text Files). pandas makes this whole procedure a


bit more convenient, the code more concise, and the execution in general faster:


In  [ 91 ]: %time data.to_csv(filename  +   ‘.csv’)
Out[91]: CPU times: user 5.55 s, sys: 137 ms, total: 5.69 s
Wall time: 5.87 s

Reading the data now stored in the CSV file and plotting it is accomplished with the


read_csv function (cf. Figure 7-3 for the result):


In  [ 92 ]: %%time
pd.read_csv(filename + ‘.csv’)[[‘No1’, ‘No2’,
‘No3’, ‘No4’]].hist(bins= 20 )
Out[92]: CPU times: user 1.72 s, sys: 54 ms, total: 1.77 s
Wall time: 1.78 s

Figure 7-3. Histogram of four data sets

Data as Excel File


Although working with Excel spreadsheets is the topic of a later chapter, we want to


briefly demonstrate how pandas can write data in Excel format and read data from Excel


spreadsheets. We restrict the data set to 100,000 rows in this case:


In  [ 93 ]: %time data[: 100000 ].to_excel(filename +   ‘.xlsx’)
Out[93]: CPU times: user 27.5 s, sys: 131 ms, total: 27.6 s
Wall time: 27.7 s

Generating the Excel spreadsheet with this small subset of the data takes quite a while.

Free download pdf