Learning Python Network Programming

(Sean Pound) #1
Chapter 2

Here, we have created a request and submitted it using urlopen, and urlopen
added the user agent header to the request. We can examine this header by using the
get_header() method. This header and its value are included in every request made
by urllib, so every server we make a request to can see that we are using Python 3.4
and the urllib library.


Webmasters can inspect the user agents of requests and then use the information for
various things, including the following:



  • Classifying visits for their website statistics

  • Blocking clients with certain user agent strings

  • Sending alternative versions of resources for user agents with known
    problems, such as bugs when interpreting certain languages like CSS,
    or not supporting some languages at all, such as JavaScript


The last two can cause problems for us because they can stop or interfere with us
accessing the content that we're after. To work around this, we can try and set our
user agent so that it mimics a well known browser. This is known as spoofing, as
shown here:





req = Request('http://www.debian.org')








req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64;
rv:24.0) Gecko/20140722 Firefox/24.0 Iceweasel/24.7.0')








response = urlopen(req)





The server will respond as if our application is a regular Firefox client. User agent
strings for different browsers are available on the web. I'm yet to come across a
comprehensive resource for them, but Googling for a browser and version number
will usually turn something up. Alternatively you can use Wireshark to capture an
HTTP request made by the browser you want to emulate and look at the captured
request's user agent header.


Cookies


A cookie is a small piece of data that the server sends in a Set-Cookie header as a
part of the response. The client stores cookies locally and includes them in any future
requests that are sent to the server.


Servers use cookies in various ways. They can add a unique ID to them, which
enables them to track a client as it accesses different areas of a site. They can store
a login token, which will automatically log the client in, even if the client leaves
the site and then accesses it later. They can also be used for storing the client's user
preferences or snippets of personalizing information, and so on.

Free download pdf