Python for Finance: Analyze Big Financial Data

(Elle) #1
In  [ 54 ]: dtimes  =   np.arange(‘2015-01-01   10:00:00’,  ‘2021-12-31 22:00:00’,
dtype=‘datetime64[m]’) # minute intervals
len(dtimes)
Out[54]: 3681360

What is a table in a SQL database is a structured array with NumPy. We use a special dtype


object mirroring the SQL table from before:


In  [ 55 ]: dty =   np.dtype([(‘Date’,  ‘datetime64[m]’),   (‘No1’, ‘f’),   (‘No2’, ‘f’)])
data = np.zeros(len(dtimes), dtype=dty)

With the dates object, we populate the Date column:


In  [ 56 ]: data[‘Date’]    =   dtimes

The other two columns are populated as before with pseudorandom numbers:


In  [ 57 ]: a   =   np.random.standard_normal((len(dtimes),  2 )).round( 5 )
data[‘No1’] = a[:, 0 ]
data[‘No2’] = a[:, 1 ]

Saving of ndarray objects is highly optimized and therefore quite fast. Almost 60 MB of


data takes less than 0.1 seconds to save on disk (here using an SSD):


In  [ 58 ]: %time np.save(path  +   ‘array’,    data)       #   suffix  .npy    is  added
Out[58]: CPU times: user 0 ns, sys: 77 ms, total: 77 ms
Wall time: 77.1 ms
In [ 59 ]: ll $path*
Out[59]: -rw-r—r— 1 root 58901888 28. Sep 15:16 /flash/data/array.npy

Reading is even faster:


In  [ 60 ]: %time np.load(path  +   ‘array.npy’)
Out[60]: CPU times: user 10 ms, sys: 29 ms, total: 39 ms
Wall time: 37.8 ms

                                    array([ (datetime.datetime(2015,    1,  1,  9,  0), -1.4985100030899048,
0.9664400219917297),
(datetime.datetime(2015, 1, 1, 9, 1), -0.2501699924468994,
-0.9184499979019165),
(datetime.datetime(2015, 1, 1, 9, 2), 1.2026900053024292,
0.49570000171661377),
...,
(datetime.datetime(2021, 12, 31, 20, 57), 0.8927800059318542,
-1.0334899425506592),
(datetime.datetime(2021, 12, 31, 20, 58), 1.0062999725341797,
-1.3476499915122986),
(datetime.datetime(2021, 12, 31, 20, 59), -0.08011999726295471,
0.4992400109767914)],
dtype=[(‘Date’, ‘<M8[m]’), (‘No1’, ‘<f4’), (‘No2’, ‘<f4’)])

A data set of 60 MB is not that large. Therefore, let us try a somewhat larger ndarray


object:


In  [ 61 ]: data    =   np.random.standard_normal(( 10000 ,  6000 ))
In [ 62 ]: %time np.save(path + ‘array’, data)
Out[62]: CPU times: user 0 ns, sys: 631 ms, total: 631 ms
Wall time: 633 ms
In [ 63 ]: ll $path*
Out[63]: -rw-r—r— 1 root 480000080 28. Sep 15:16 /flash/data/array.npy

In this case, the file on disk is about 480 MB large and it is written in less than a second.


This illustrates that writing to disk in this case is mainly hardware-bound, since 480 MB/s


represents roughly the advertised writing speed of better SSDs at the time of this writing


(512 MB/s). Reading the file/object from disk is even faster (note that caching techniques


might also play a role here):


In  [ 64 ]: %time np.load(path  +   ‘array.npy’)
Free download pdf