Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


... print('reason', e.reason)


... print('url', e.url)


status: 404


reason: Not Found


url: http://www.ietf.org/rfc/rfc0.txt


Here we've requested RFC 0, which doesn't exist. So the server has returned a 404
status code, and urllib has spotted this and raised an HTTPError.


You can see that HTTPError provide useful attributes regarding the request. In the
preceding example, we used the status, reason, and url attributes to get some
information about the response.


If something goes wrong lower in the network stack, then the appropriate module
will raise an exception. The urllib package catches these exceptions and then wraps
them as URLErrors. For example, we might have specified a host or an IP address
that doesn't exist, as shown here:





urlopen('http://192.0.2.1/index.html')





urllib.error.URLError: <urlopen error [Errno 110] Connection timed
out>


In this instance, we have asked for index.html from the 192.0.2.1. host. The
192.0.2.0/24 IP address range is reserved to be used by documentation only, so
you will never encounter a host using the preceding IP address. Hence the TCP
connection times out and socket raises a timeout exception, which urllib catches,
re-wraps, and re-raises for us. We can catch these exceptions in the same way as we
did in the preceding example.


HTTP headers


Requests, and responses are made up of two main parts, headers and a body.
We briefly saw some HTTP headers when we used our TCP RFC downloader in
Chapter 1, Network Programming and Python. Headers are the lines of protocol-specific
information that appear at the beginning of the raw message that is sent over the
TCP connection. The body is the rest of the message. It is separated from the headers
by a blank line. The body is optional, its presence depends on the type of request or
response. Here's an example of an HTTP request:


GET / HTTP/1.1
Accept-Encoding: identity
Free download pdf