Python for Finance: Analyze Big Financial Data

(Elle) #1
Figure 7-1. Plot of the query result

From SQL to pandas


A generally more efficient approach, however, is the reading of either whole tables or


query results with pandas. When you are able to read a whole table into memory,


analytical queries can generally be executed much faster than when using the SQL disk-


based approach. The sublibrary pandas.io.sql contains functions to handle data stored in


SQL databases:


In  [ 77 ]: import pandas.io.sql as pds

Reading the whole table with pandas takes roughly the same amount of time as reading it


into a NumPy ndarray object. There as here, the bottleneck is the SQL database:


In  [ 78 ]: %time data  =   pds.read_sql(‘SELECT    *   FROM    numbers’,   con)
Out[78]: CPU times: user 2.16 s, sys: 60 ms, total: 2.22 s
Wall time: 2.23 s
In [ 79 ]: data.head()
Out[79]: No1 No2 No3 No4 No5
0 -1.67378 -0.58292 -1.10616 1.14929 -0.03930
1 1.38006 0.82665 0.34168 -1.16760 -0.53274
2 0.79329 0.11947 2.06403 -0.36208 1.77442
3 -0.33507 -0.00715 -1.01193 0.23157 1.30225
4 -0.35292 0.67483 1.59507 -1.21263 0.14745

                                    [5  rows    x   5   columns]

The data is now in-memory. This allows for much faster analytics. The SQL query that


takes a few seconds with SQLite3 finishes in less than 0.1 seconds with pandas in-


memory:


In  [ 80 ]: %time data[(data[‘No1’] >    0 )    &   (data[‘No2’]    <    0 )].head()
Out[80]: CPU times: user 50 ms, sys: 0 ns, total: 50 ms
Wall time: 49.9 ms

                                                                    No1                     No2                     No3                     No4                     No5
6 1.17749 -1.13017 -0.24176 -0.64047 1.58002
8 0.18625 -0.99949 2.29854 0.91816 -0.92661
9 1.09481 -0.26301 1.11341 0.68716 -0.71524
18 0.31836 -0.33039 -1.50109 0.52961 0.96595
20 0.40261 -0.45917 0.37339 -1.09515 0.23972

                                    [5  rows    x   5   columns]

pandas can master even more complex queries, although it is neither meant nor able to


replace SQL databases when it comes to complex, relational data structures. The result of


the next query is shown in Figure 7-2:

Free download pdf