Functional Python Programming

(Wang) #1

The Multiprocessing and Threading Modules


This function will produce a Counter() object that shows the frequency of each path
in an AccessDetails object. In order to focus on a particular set of paths, we'll use
the reduce_total(book_filter(details)) method. This provides a summary of
only items that are passed by the given filter.


The complete analysis process


Here is the composite analysis() function that digests a collection of logfiles:


def analysis(filename):
details= path_filter(access_detail_iter(access_iter(local_gzip
(filename))))
books= book_filter(details)
totals= reduce_book_total(books)
return totals


The preceding command snippet will work with a single filename or file pattern.
It applies a standard set of parsing functions, path_filter(), accessdetail
iter(), access_iter(), and local_gzip(), to a filename or file pattern and returns
an iterable sequence of the AccessDetails objects. It then applies our analytical
filter and reduction to that sequence of the AccessDetails objects. The result is a
Counter object that shows the frequency of access for certain paths.


A specific collection of saved .gzip format logfiles totals about 51 MB. Processing
the files serially with this function takes over 140 seconds. Can we do better using
concurrent processing?


Using a multiprocessing pool for concurrent processing


One elegant way to make use of the multiprocessing module is to create a
processing Pool object and assign work to the various processes in that pool.
We will use the OS to interleave execution among the various processes. If each
of the processes has a mixture of I/O and computation, we should be able to assure
that our processor is very busy. When processes are waiting for I/O to complete,
other processes can do their computation. When an I/O completes, a process will be
ready to run and can compete with others for processing time.


The recipe for mapping work to a separate process looks as follows:


import multiprocessing
with multiprocessing.Pool(4) as workers:
workers.map(analysis, glob.glob(pattern))

Free download pdf