In [ 54 ]: dtimes = np.arange(‘2015-01-01 10:00:00’, ‘2021-12-31 22:00:00’,
dtype=‘datetime64[m]’) # minute intervals
len(dtimes)
Out[54]: 3681360
What is a table in a SQL database is a structured array with NumPy. We use a special dtype
object mirroring the SQL table from before:
In [ 55 ]: dty = np.dtype([(‘Date’, ‘datetime64[m]’), (‘No1’, ‘f’), (‘No2’, ‘f’)])
data = np.zeros(len(dtimes), dtype=dty)
With the dates object, we populate the Date column:
In [ 56 ]: data[‘Date’] = dtimes
The other two columns are populated as before with pseudorandom numbers:
In [ 57 ]: a = np.random.standard_normal((len(dtimes), 2 )).round( 5 )
data[‘No1’] = a[:, 0 ]
data[‘No2’] = a[:, 1 ]
Saving of ndarray objects is highly optimized and therefore quite fast. Almost 60 MB of
data takes less than 0.1 seconds to save on disk (here using an SSD):
In [ 58 ]: %time np.save(path + ‘array’, data) # suffix .npy is added
Out[58]: CPU times: user 0 ns, sys: 77 ms, total: 77 ms
Wall time: 77.1 ms
In [ 59 ]: ll $path*
Out[59]: -rw-r—r— 1 root 58901888 28. Sep 15:16 /flash/data/array.npy
Reading is even faster:
In [ 60 ]: %time np.load(path + ‘array.npy’)
Out[60]: CPU times: user 10 ms, sys: 29 ms, total: 39 ms
Wall time: 37.8 ms
array([ (datetime.datetime(2015, 1, 1, 9, 0), -1.4985100030899048,
0.9664400219917297),
(datetime.datetime(2015, 1, 1, 9, 1), -0.2501699924468994,
-0.9184499979019165),
(datetime.datetime(2015, 1, 1, 9, 2), 1.2026900053024292,
0.49570000171661377),
...,
(datetime.datetime(2021, 12, 31, 20, 57), 0.8927800059318542,
-1.0334899425506592),
(datetime.datetime(2021, 12, 31, 20, 58), 1.0062999725341797,
-1.3476499915122986),
(datetime.datetime(2021, 12, 31, 20, 59), -0.08011999726295471,
0.4992400109767914)],
dtype=[(‘Date’, ‘<M8[m]’), (‘No1’, ‘<f4’), (‘No2’, ‘<f4’)])
A data set of 60 MB is not that large. Therefore, let us try a somewhat larger ndarray
object:
In [ 61 ]: data = np.random.standard_normal(( 10000 , 6000 ))
In [ 62 ]: %time np.save(path + ‘array’, data)
Out[62]: CPU times: user 0 ns, sys: 631 ms, total: 631 ms
Wall time: 633 ms
In [ 63 ]: ll $path*
Out[63]: -rw-r—r— 1 root 480000080 28. Sep 15:16 /flash/data/array.npy
In this case, the file on disk is about 480 MB large and it is written in less than a second.
This illustrates that writing to disk in this case is mainly hardware-bound, since 480 MB/s
represents roughly the advertised writing speed of better SSDs at the time of this writing
(512 MB/s). Reading the file/object from disk is even faster (note that caching techniques
might also play a role here):
In [ 64 ]: %time np.load(path + ‘array.npy’)