Foundations of Python Network Programming

(WallPaper) #1

Chapter 9 ■ http Clients


158


Here, two popular sites have taken opposite stances on whether the www prefix should be part of their official
hostname. However, in both cases they are willing to use a redirect to enforce their preference and also to prevent the
chaos of their site appearing to live at two different URLs. Unless your application is careful to learn these redirections
and avoid repeating them, you will wind up doing two HTTP requests instead of one for every resource you fetch if
your URLs are built from the wrong hostname.
The other question to investigate regarding your HTTP client is how it chooses to alert you if an attempt to fetch a
URL fails with a 4xx or 5xx status code. For all such codes, the Standard Library urlopen() raises an exception, making it
impossible for your code to accidentally process an error page returned from the server as though it were normal data.





urlopen('http://localhost:8000/status/500')
Traceback (most recent call last):
...
urllib.error.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR





How can you ever examine the details of the response if urlopen() interrupts you with an exception? The answer
is by examining the exception object, which performs double duty by being both an exception and also a response
object with headers and a body.





try:
... urlopen('http://localhost:8000/status/500')
... except urllib.error.HTTPError as e:
... print(e.status, repr(e.headers['Content-Type']))
500 'text/html; charset=utf-8'





The situation presented by the Requests library is more surprising—even error status codes result in a response
object being returned, without comment, to the caller. It is the responsibility of the caller either to test the status code
of the response or to volunteer to call its raise_for_status() method that will trigger an exception on 4xx or 5xx
status code.





r = requests.get('http://localhost:8000/status/500')
r.status_code
500
r.raise_for_status()
Traceback (most recent call last):
...
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR





If you are worried about having to remember to perform a status check every time you call requests.get, then
you might consider writing a wrapper function of your own that performs the check automatically.


Caching and Validation


HTTP includes several well-designed mechanisms for letting clients avoid the repeated GET of resources of which
they are making frequent use, but they operate only if the server chooses to add headers to the resource allowing
them. It is important for server authors to think through caching and allow it whenever possible since it reduces both
network traffic and server load while also letting client applications run faster.
RFCs 7231 and 7232 describe all of these mechanisms in exhaustive detail. This section attempts only to provide
a basic introduction.
The most important question that a service architect can ask when they want to add headers to turn on caching
is whether two requests should really return the same document merely because their paths are identical. Is there
anything else about a pair of requests that might result in their needing to return two different resources? If so,

Free download pdf