Python for Finance: Analyze Big Financial Data

In [ 54 ]: dtimes = np.arange(‘2015-01-01 10:00:00’, ‘2021-12-31 22:00:00’, dtype=‘datetime64[m]’) # minute intervals len(dtimes) Out[54]: 3681360

What is a table in a SQL database is a structured array with NumPy. We use a special dtype

object mirroring the SQL table from before:

In [ 55 ]: dty = np.dtype([(‘Date’, ‘datetime64[m]’), (‘No1’, ‘f’), (‘No2’, ‘f’)]) data = np.zeros(len(dtimes), dtype=dty)

With the dates object, we populate the Date column:

In [ 56 ]: data[‘Date’] = dtimes

The other two columns are populated as before with pseudorandom numbers:

In [ 57 ]: a = np.random.standard_normal((len(dtimes), 2 )).round( 5 ) data[‘No1’] = a[:, 0 ] data[‘No2’] = a[:, 1 ]

Saving of ndarray objects is highly optimized and therefore quite fast. Almost 60 MB of

data takes less than 0.1 seconds to save on disk (here using an SSD):

In [ 58 ]: %time np.save(path + ‘array’, data) # suffix .npy is added Out[58]: CPU times: user 0 ns, sys: 77 ms, total: 77 ms Wall time: 77.1 ms In [ 59 ]: ll $path* Out[59]: -rw-r—r— 1 root 58901888 28. Sep 15:16 /flash/data/array.npy

Reading is even faster:

In [ 60 ]: %time np.load(path + ‘array.npy’) Out[60]: CPU times: user 10 ms, sys: 29 ms, total: 39 ms Wall time: 37.8 ms

array([ (datetime.datetime(2015, 1, 1, 9, 0), -1.4985100030899048, 0.9664400219917297), (datetime.datetime(2015, 1, 1, 9, 1), -0.2501699924468994, -0.9184499979019165), (datetime.datetime(2015, 1, 1, 9, 2), 1.2026900053024292, 0.49570000171661377), ..., (datetime.datetime(2021, 12, 31, 20, 57), 0.8927800059318542, -1.0334899425506592), (datetime.datetime(2021, 12, 31, 20, 58), 1.0062999725341797, -1.3476499915122986), (datetime.datetime(2021, 12, 31, 20, 59), -0.08011999726295471, 0.4992400109767914)], dtype=[(‘Date’, ‘<M8[m]’), (‘No1’, ‘<f4’), (‘No2’, ‘<f4’)])

A data set of 60 MB is not that large. Therefore, let us try a somewhat larger ndarray

object:

In [ 61 ]: data = np.random.standard_normal(( 10000 , 6000 )) In [ 62 ]: %time np.save(path + ‘array’, data) Out[62]: CPU times: user 0 ns, sys: 631 ms, total: 631 ms Wall time: 633 ms In [ 63 ]: ll $path* Out[63]: -rw-r—r— 1 root 480000080 28. Sep 15:16 /flash/data/array.npy

In this case, the file on disk is about 480 MB large and it is written in less than a second.

This illustrates that writing to disk in this case is mainly hardware-bound, since 480 MB/s

represents roughly the advertised writing speed of better SSDs at the time of this writing

(512 MB/s). Reading the file/object from disk is even faster (note that caching techniques

might also play a role here):

In [ 64 ]: %time np.load(path + ‘array.npy’)