Learning Python Network Programming

(Sean Pound) #1
Chapter 2

Redirects


Sometimes servers move their content around. They also make some content obsolete
and put up new stuff in a different location. Sometimes they'd like us to use the
more secure HTTPS protocol instead of HTTP. In all these cases, they may get traffic
that asks for the old URLs, and in all these cases they'd probably prefer to be able to
automatically send visitors to the new ones.


The 300 range of HTTP status codes is designed for this purpose. These codes
indicate to the client that further action is required on their part to complete the
request. The most commonly encountered action is to retry the request at a different
URL. This is called a redirect.


We'll learn how this works when using urllib. Let's make a request:





req = Request('http://www.gmail.com')








response = urlopen(req)





Simple enough, but now, look at the URL of the response:





response.url





'https://accounts.google.com/ServiceLogin?service=mail&passive=true&r
m=false...'


This is not the URL that we requested! If we open this new URL in a browser,
then we'll see that it's actually the Google login page (you may need to clear your
browser cookies to see this if you already have a cached Google login session).
Google redirected us from http://www.gmail.com to its login page, and urllib
automatically followed the redirect. Moreover, we may have been redirected more
than once. Look at the redirect_dict attribute of our request object:





req.redirect_dict





{'https://accounts.google.com/ServiceLogin?service=...': 1,
'https://mail.google.com/mail/': 1}


The urllib package adds every URL that we were redirected through to this dict.
We can see that we have actually been redirected twice, first to https://mail.
google.com, and second to the login page.


When we send our first request, the server sends a response with a redirect status
code, one of 301, 302, 303, or 307. All of these indicate a redirect. This response
includes a Location header, which contains the new URL. The urllib package will
submit a new request to that URL, and in the aforementioned case, it will receive yet
another redirect, which will lead it to the Google login page.

Free download pdf