Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


Then, we check the Content-Type value of our response, and extract the
character set:





format, params = response.getheader('Content-Type').split(';')
params
' charset=utf-8'
charset = params.split('=')[1]
charset
'utf-8'





Lastly, we decode our response content by using the supplied character set:





content = response.read().decode(charset)





Note that quite often, the server either doesn't supply a charset in the Content-
Type header, or it supplies the wrong charset. So, this value should be taken as
a suggestion. This is one of the reasons that we look at the Requests library later
in this chapter. It will automatically gather all the hints that it can find about what
character set should be used for decoding a response body and make a best guess
for us.


User agents


Another request header worth knowing about is the User-Agent header. Any client
that communicates using HTTP can be referred to as a user agent. RFC 7231 suggests
that user agents should use the User-Agent header to identify themselves in every
request. What goes in there is up to the software that makes the request, though it
usually comprises a string that identifies the program and version, and possibly the
operating system and the hardware that it's running on. For example, the user agent
for my current version of Firefox is shown here:


Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140722
Firefox/24.0 Iceweasel/24.7.0

Although it has been broken over two lines here, it is a single long string. As you can
probably decipher, I'm running Iceweasel (Debian's version of Firefox) version 24 on
a 64-bit Linux system. User agent strings aren't intended for identifying individual
users. They only identify the product that was used for making the request.


We can view the user agent that urllib uses. Perform the following steps:





req = Request('http://www.python.org')
urlopen(req)
req.get_header('User-agent')
'Python-urllib/3.4'




Free download pdf