Functional Python Programming

(Wang) #1

The Multiprocessing and Threading Modules


There are several other ways to produce similar output. For example, here is an
alternative version of the inner for loop in the preceding example. The line_iter()
function will also emit lines of a given file:


def line_iter(zip_file):


log= gzip.open(zip_file, "rb")


return (line.decode('us-ascii').rstrip() for line in log)


The line_iter() function applies the gzip.open() function and some line cleanup.
We can use a mapping to apply the line_iter() function to all files that match a
pattern as follows:


map(line_iter, glob.glob(pattern))


While this alternative mapping is succinct, it has the disadvantage of leaving
open file objects lying around waiting to be properly garbage-collected when there
are no more references. When processing a large number of files, this seems like a
needless bit of overhead. For this reason, we'll focus on the local_gzip() function
shown previously.


The previous alternative mapping has the distinct advantage of fitting well with the
way the multiprocessing module works. We can create a worker pool and map
tasks (such as file reading) to the pool of processes. If we do this, we can read these
files in parallel; the open file objects will be part of separate processes.


An extension to this design will include a second function to transfer files from
the web host using FTP. As the files are collected from the web server, they can be
analyzed using the local_gzip() function.


The results of the local_gzip() function are used by the access_iter() function
to create namedtuples for each row in the source file that describes a file access.


Parsing log lines into namedtuples


Once we have access to all of the lines of each logfile, we can extract details of
the access that's described. We'll use a regular expression to decompose the line.
From there, we can build a namedtuple object.


Here is a regular expression to parse lines in a CLF file:


format_pat= re.compile(
r"(?P[\d.]+)\s+"
r"(?P\S+)\s+"
r"(?P\S+)\s+"
r"[(?P

Free download pdf