Foundations of Python Network Programming

(WallPaper) #1
Chapter 9 ■ http Clients

157

•    405 Method Not Allowed: The server recognizes the method and path, but this particular
method does not make sense when run against this particular path.

•    500 Server Error: Another familiar status. The server wants to fulfill the request but cannot at
the moment because of some internal error.

•    501 Not Implemented: The server does not recognize your HTTP verb.

•    502 Bad Gateway: The server is a gateway or proxy (see Chapter 10), but it cannot contact the
server behind it that is supposed to provide the response for this path.

While responses with 3xx status codes are not expected to carry a body, both 4xx and 5xx responses usually
do so—generally offering some kind of human-readable description of the error. The less informative examples are
typically unmodified error pages for the language or framework in which the web server has been written. Server
authors have often handcrafted more informative pages to help users or developers know how to recover from the error.
As you are learning a particular Python HTTP client, there are two important questions to ask regarding
status codes.
The first question is whether a library automatically follows redirects. If not, you have to detect 3xx status codes
yourself and follow their Location header. While the low-level httplib module built into the Standard Library would
make you follow redirects yourself, the urllib module will follow them for you in conformance with the standard.
The Requests library does the same, and it additionally presents you with a history attribute that lists the whole series
of redirects that brought you to the final location.





r = urlopen('http://httpbin.org/status/301')
r.status, r.url
(200, 'http://httpbin.org/get')
r = requests.get('http://httpbin.org/status/301')
(r.status, r.url)
(200, 'http://httpbin.org/get')
r.history
[<Response [301]>, <Response [302]>]





The Requests library additionally lets you turn redirection off, if you prefer, with a simple keyword argument—a
maneuver that is possible but much more difficult if attempted with urllib.





r = requests.get('http://httpbin.org/status/301',
... allow_redirects=False)
r.raise_for_status()
(r.status_code, r.url, r.headers['Location'])
(301, 'http://localhost:8000/status/301', '/redirect/1')





It will reduce load on the servers that you query if your Python program takes the time to detect 301 errors and
attempt to avoid those URLs in the future. If your program maintains a persistent state, then it might be able to cache
301 errors to avoid revisiting those paths, or directly rewrite the URL wherever you have it stored. If a user requested
the URL interactively, then you might print a helpful message informing them of the new location of the page.
Two of the most common redirections involve whether the prefix www belongs at the front of the hostname you
use to contact a server.





r = requests.get('http://google.com/')
r.url
'http://www.google.com/'
r = requests.get('http://www.twitter.com/')
r.url
'https://twitter.com/'




Free download pdf