Functional Python Programming

(Wang) #1
Chapter 12

except ValueError as e:
print(e, repr(access))
return filter(None, map(access_detail_builder, iterable))


We've changed the construction of the AccessDetails object to be a function
that returns a single value. We can map that function to the iterable input stream
of the Access objects. This also fits nicely with the way the multiprocessing
module works.


In an object-oriented programming environment, these additional parsers might be
method functions or properties of a class definition. The advantage of this design
is that items aren't parsed unless they're needed. This particular functional design
parses everything, assuming that it's going to be used.


A different function design might rely on the three parser functions to extract and
parse the various elements from a given Access object as needed. Rather than using
the details.time attribute, we'd use the parse_time(access.time) parameter.
The syntax is longer, but the attribute is only parsed as needed.


Filtering the access details


We'll look at several filters for the AccessDetails objects. The first is a collection of
filters that reject a lot of overhead files that are rarely interesting. The second filter
will be part of the analysis functions, which we'll look at later.


The path_filter() function is a combination of three functions:



  1. Exclude empty paths.

  2. Exclude some specific filenames.

  3. Exclude files that have a given extension.


An optimized version of the path_filter() function looks as follows:


def path_filter(access_details_iter):
name_exclude = {
'favicon.ico', 'robots.txt', 'humans.txt',
'crossdomain.xml' ,
'_images', 'search.html', 'genindex.html',
'searchindex.js', 'modindex.html', 'py-modindex.html',
}
ext_exclude = {
'.png', '.js', '.css',
}
for detail in access_details_iter:
path = detail.url.path.split('/')

Free download pdf