Functional Python Programming

(Wang) #1

More Itertools Techniques


The final selection of colors uses the groupby() function and the min(choices,...)
expression, as shown in the following command snippet:


for _, choices in groupby(distances, key=lambda xy_p_c_d:


xy_p_c_d[0]):


print(min(choices, key=lambda xypcd: xypcd[3])))]))


The overall product of pixels and colors is a long, flat iterable. We grouped the iterable
into smaller collections where the coordinates match. This will break the big iterable
into smaller iterables of just colors associated with a single pixel. We can then pick the
minimal color distance for each color.


In a picture that's 3,648×2,736 with 133 Crayola colors, we have an iterable with
1,327,463,424 items to be evaluated. Yes. That's a billion combinations created by this
distances expression. The number is not necessarily impractical. It's well within the
limits of what Python can do. However, it reveals an important flaw in the naïve use
of the product() function.


We can't trivially do this kind of large-scale processing without some analysis to
see how large it is. Here are some timeit numbers for these that do each of these
calculations only 1,000,000 times:



  • Euclidean 2.8

  • Manhattan 1.8


Scaling up from 1 million to 1 billion means 1,800 seconds, that is, about half an
hour for the Manhattan distance and 46 minutes to calculate the Euclidean distance.
It appears that the core arithmetic operations of Python are too slow for this kind of
naïve bulk processing.


More importantly, we're doing it wrong. This kind of width×height×color processing is
simply a bad design. In many cases, we can do much better.


Performance analysis


A key feature of any big data algorithm is locating a way to execute some kind of a
divide-and-conquer strategy. This is true of functional programming design as well
as imperative design.


We have three options to speed up this processing; they are as follows:



  • We can try to use parallelism to do more of the calculations concurrently.
    On a four-core processor, the time can be cut to approximately ¼. This cuts
    the time to 8 minutes for Manhattan distances.

Free download pdf