The Multiprocessing and Threading Modules
There are some alternative ways to do this. For example, we can revise the use of the
map() function as follows:
def access_builder(line):
match= format_pat.match(line)
if match:
return Access(**match.groupdict())
The preceding alternative function embodies just the essential parse and builds an
Access object processing. It will either return an Access or a None object. This differs
from the version above that also filters items that don't match the regular expression.
Here is how we can use this function to flatten logfiles into a single stream of the
Access objects:
map(access_builder, (line for log in source_iter for line in
log))
This shows how we can transform the output from the local_gzip() function into
a sequence of the Access instances. In this case, we apply the access_builder()
function to the nested iterator of iterable structure that results from reading a
collection of files.
Our point here is to show that we have a number of functional styles for parsing
files. In Chapter 4, Working with Collections we showed very simple parsing. Here,
we're performing more complex parsing, using a variety of techniques.
Parsing additional fields of an Access object
The initial Access object created previously doesn't decompose some inner
elements in the nine fields that comprise an access log line. We'll parse those items
separately from the overall decomposition into high-level fields. It keeps the regular
expressions for parsing somewhat simpler if we break this down into separate
parsing operations.
The resulting object is a namedtuple object that will wrap the original Access tuple.
It will have some additional fields for the details parsed separately:
AccessDetails = namedtuple('AccessDetails', ['access', 'time',
'method', 'url', 'protocol', 'referrer', 'agent'])