Python for Finance: Analyze Big Financial Data

small arrays, this has hardly any measurable impact on the performance of array

operations. However, when arrays get large the story is somewhat different, depending on

the operations to be implemented on the arrays.

To illustrate this important point for memory-wise handling of arrays in science and

finance, consider the following construction of multidimensional numpy.ndarray objects:

In [ 133 ]: x = np.random.standard_normal(( 5 , 10000000 )) y = 2 * x + 3 # linear equation y = a * x + b C = np.array((x, y), order=‘C’) F = np.array((x, y), order=‘F’) x = 0.0; y = 0.0 # memory cleanup In [ 134 ]: C[: 2 ].round( 2 ) Out[134]: array([[[-0.51, -1.14, -1.07, ..., 0.2 , -0.18, 0.1 ], [-1.22, 0.68, 1.83, ..., 1.23, -0.27, -0.16], [ 0.45, 0.15, 0.01, ..., -0.75, 0.91, -1.12], [-0.16, 1.4 , -0.79, ..., -0.33, 0.54, 1.81], [ 1.07, -1.07, -0.37, ..., -0.76, 0.71, 0.34]],

[[ 1.98, 0.72, 0.86, ..., 3.4 , 2.64, 3.21], [ 0.55, 4.37, 6.66, ..., 5.47, 2.47, 2.68], [ 3.9 , 3.29, 3.03, ..., 1.5 , 4.82, 0.76], [ 2.67, 5.8 , 1.42, ..., 2.34, 4.09, 6.63], [ 5.14, 0.87, 2.27, ..., 1.48, 4.43, 3.67]]])

Let’s look at some really fundamental examples and use cases for both types of ndarray

objects:

In [ 135 ]: %timeit C.sum() Out[135]: 10 loops, best of 3: 123 ms per loop In [ 136 ]: %timeit F.sum() Out[136]: 10 loops, best of 3: 123 ms per loop

When summing up all elements of the arrays, there is no performance difference between

the two memory layouts. However, consider the following example with the C-like

memory layout:

In [ 137 ]: %timeit C[ 0 ].sum(axis= 0 ) Out[137]: 10 loops, best of 3: 102 ms per loop In [ 138 ]: %timeit C[ 0 ].sum(axis= 1 ) Out[138]: 10 loops, best of 3: 61.9 ms per loop

Summing five large vectors and getting back a single large results vector obviously is

slower in this case than summing 10,000,000 small ones and getting back an equal number

of results. This is due to the fact that the single elements of the small vectors — i.e., the

rows — are stored next to each other. With the Fortran-like memory layout, the relative

performance changes considerably:

In [ 139 ]: %timeit F.sum(axis= 0 ) Out[139]: 1 loops, best of 3: 801 ms per loop In [ 140 ]: %timeit F.sum(axis= 1 ) Out[140]: 1 loops, best of 3: 2.23 s per loop In [ 141 ]: F = 0.0; C = 0.0 # memory cleanup

In this case, operating on a few large vectors performs better than operating on a large

number of small ones. The elements of the few large vectors are stored in memory next to

each other, which explains the relative performance advantage. However, overall the

operations are absolutely much slower when compared to the C-like variant.