Python for Finance: Analyze Big Financial Data

(Elle) #1

Our test case will again be a large set of floating-point numbers:


In  [ 66 ]: import numpy as np
import pandas as pd
data = np.random.standard_normal(( 1000000 , 5 )).round( 5 )
# sample data set
In [ 67 ]: filename = path + ‘numbs’

To this end, we will also revisit SQLite3 and will compare the performance with


alternative approaches using pandas.


SQL Database


All that follows with regard to SQLite3 should be known by now:


In  [ 68 ]: import sqlite3 as sq3
In [ 69 ]: query = ‘CREATE TABLE numbers (No1 real, No2 real,\
No3 real, No4 real, No5 real)’
In [ 70 ]: con = sq3.Connection(filename + ‘.db’)
In [ 71 ]: con.execute(query)
Out[71]: <sqlite3.Cursor at 0x9d59c00>

This time, executemany can be applied since we write from a single ndarray object:


In  [ 72 ]: %%time
con.executemany(‘INSERT INTO numbers VALUES (?, ?, ?, ?, ?)’, data)
con.commit()
Out[72]: CPU times: user 13.9 s, sys: 229 ms, total: 14.2 s
Wall time: 14.9 s
In [ 73 ]: ll $path*
Out[73]: -rw-r—r— 1 root 54446080 28. Sep 15:16 /flash/data/numbs.db

Writing the whole data set of 1,000,000 rows takes quite a while. The reading of the whole


table into a list object is much faster:


In  [ 74 ]: %%time
temp = con.execute(‘SELECT * FROM numbers’).fetchall()
print temp[: 2 ]
temp = 0.0
Out[74]: [(-1.67378, -0.58292, -1.10616, 1.14929, -0.0393), (1.38006, 0.82665, 0
.34168, -1.1676, -0.53274)]
CPU times: user 1.54 s, sys: 138 ms, total: 1.68 s
Wall time: 1.68 s

Reading SQL query results directly into a NumPy ndarray object is easily accomplished.


Accordingly, you can also easily plot the results of such a query, as shown by the


following code and the output in Figure 7-1:


In  [ 75 ]: %%time
query = ‘SELECT * FROM numbers WHERE No1 > 0 AND No2 < 0’
res = np.array(con.execute(query).fetchall()).round( 3 )
Out[75]: CPU times: user 766 ms, sys: 34 ms, total: 800 ms
Wall time: 799 ms
In [ 76 ]: res = res[:: 100 ] # every 100th result
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(res[:, 0 ], res[:, 1 ], ‘ro’)
plt.grid(True); plt.xlim(-0.5, 4.5); plt.ylim(-4.5, 0.5)
Free download pdf