Functional Python Programming

(Wang) #1
Chapter 12

The access attribute is the original Access object. The time attribute is the
parsed access.time string. The method, url, and protocol attributes come from
decomposing the access.request field. The referrer attribute is a parsed URL.
The agent attribute can also be broken down into fine-grained fields. Here are the
fields that comprise agent details:


AgentDetails= namedtuple('AgentDetails', ['product', 'system',
'platform_details_extensions'])


These fields reflect the most common syntax for agent descriptions. There is
considerable variation in this area, but this particular subset of values seems to be
reasonably common.


We'll combine three detailed parser functions into a single overall parsing function.
Here is the first part with the various detail parsers:


def access_detail_iter(iterable):


def parse_request(request):


words = request.split()


return words[0], ' '.join(words[1:-1]), words[-1]


def parse_time(ts):


return datetime.datetime.strptime(ts, "%d/%b/%Y:%H:%M:%S %z")


agent_pat= re.compile(r"(?P\S*?)\s+"


r"((?P.?))\s"


r"(?P.*)")


def parse_agent(user_agent):


agent_match= agent_pat.match(user_agent)


if agent_match:


return AgentDetails(**agent_match.groupdict())


We've written three parsers for the HTTP request, the time stamp, and the user agent
information. The request is usually a three-word string such as GET /some/path
HTTP/1.1. The parse_request() function extracts these three space-separated values.
In the unlikely event that the path has spaces in it, we'll extract the first word and the
last word as the method and protocol; all the remaining words are part of the path.


Time parsing is delegated to the datetime module. We've simply provided the
proper format in the parse_time() function.

Free download pdf