One gets similar results for calculating standard deviations:
In [ 28 ]: %timeit C.std(axis= 0 )
Out[28]: 10 loops, best of 3: 70.6 ms per loop
In [ 29 ]: %timeit C.std(axis= 1 )
Out[29]: 10 loops, best of 3: 32.6 ms per loop
For comparison, consider the Fortran-like layout. Sums first:
In [ 30 ]: %timeit F.sum(axis= 0 )
Out[30]: 10 loops, best of 3: 29.2 ms per loop
In [ 31 ]: %timeit F.sum(axis= 1 )
Out[31]: 10 loops, best of 3: 37 ms per loop
Although absolutely slower compared to the other layout, there is hardly a relative
difference for the two axes. Now, standard deviations:
In [ 32 ]: %timeit F.std(axis= 0 )
Out[32]: 10 loops, best of 3: 107 ms per loop
In [ 33 ]: %timeit F.std(axis= 1 )
Out[33]: 10 loops, best of 3: 98.8 ms per loop
Again, this layout option leads to worse performance compared to the C-like layout. There
is a small difference between the two axes, but again it is not as pronounced as with the
other layout. The results indicate that in general the C-like option will perform better —
which is also the reason why NumPy ndarray objects default to this memory layout if not
otherwise specified:
In [ 34 ]: C = 0.0; F = 0.0