Memory Layout and Performance
NumPy allows the specification of a so-called dtype per ndarray object: for example,
np.int32 or f8. NumPy also allows us to choose from two different memory layouts when
initializing an ndarray object. Depending on the structure of the object, one layout can
have advantages compared to the other. This is illustrated in the following:
In [ 21 ]: import numpy as np
In [ 22 ]: np.zeros(( 3 , 3 ), dtype=np.float64, order=‘C’)
Out[22]: array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
The way you initialize a NumPy ndarray object can have a significant influence on the
performance of operations on these arrays (given a certain size of array). In summary, the
initialization of an ndarray object (e.g., via np.zeros or np.array) takes as input:
shape
Either an int, a sequence of ints, or a reference to another numpy.ndarray
dtype (optional)
A numpy.dtype — these are NumPy-specific basic data types for numpy.ndarray
objects
order (optional)
The order in which to store elements in memory: C for C-like (i.e., row-wise) or F for
Fortran-like (i.e., column-wise)
Consider the C-like (i.e., row-wise), storage:
In [ 23 ]: c = np.array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]], order=‘C’)
In this case, the 1s, the 2s, and the 3s are stored next to each other. By contrast, consider
the Fortran-like (i.e., column-wise) storage:
In [ 24 ]: f = np.array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]], order=‘F’)
Now, the data is stored in such a way that 1, 2, and 3 are next to each other in each
column. Let’s see whether the memory layout makes a difference in some way when the
array is large:
In [ 25 ]: x = np.random.standard_normal(( 3 , 1500000 ))
C = np.array(x, order=‘C’)
F = np.array(x, order=‘F’)
x = 0.0
Now let’s implement some standard operations on the C-like layout array. First, calculating
sums:
In [ 26 ]: %timeit C.sum(axis= 0 )
Out[26]: 100 loops, best of 3: 11.3 ms per loop
In [ 27 ]: %timeit C.sum(axis= 1 )
Out[27]: 100 loops, best of 3: 5.84 ms per loop
Calculating sums over the first axis is roughly two times slower than over the second axis.