151
Chapter 9
HTTP Clients
This is the first of three chapters about HTTP. In this chapter, you will learn how to use the protocol from the point of
view of a client program that wants to fetch and cache documents and perhaps submit queries or data to the server as
well. In the process, you will learn the rules of how the protocol operates. Chapter 10 will then look at the design and
deployment of HTTP servers. Both chapters will consider the protocol in its most pristine conceptual form, that is,
simply as a mechanism for fetching or posting documents.
While HTTP can deliver many kinds of document—images, PDFs, music, and video—Chapter 11 examines the
particular class of document that has made HTTP and the Internet world famous: the World Wide Web of hypertext
documents, which are interlinked thanks to the invention of the URL, also described in Chapter 11. There you
will learn about the programming patterns enabled by template libraries, forms, and Ajax, as well as about web
frameworks that try to bring all of these patterns together into an easy-to-program form.
HTTP version 1.1, the most common version in use today, is defined in RFCs 7230–7235, to which you should
refer in any cases where the text of these chapters seems ambiguous or leaves you wanting to know more. For a more
technical introduction to the theory behind the protocol’s design, you can consult Chapter 5 of Roy Thomas Fielding’s
famous PhD dissertation “Architectural Styles and the Design of Network-based Software Architectures.”
For now your journey begins here, where you will learn to query a server and to get documents in response.
Python Client Libraries
The HTTP protocol and the massive data resources that it makes available are a perennially popular topic for Python
programmers, and this has been reflected through the years in a long parade of third-party clients purporting to do a
better job than the urllib built into the Standard Library.
Today, however, a single third-party solution stands alone, not only having thoroughly swept the field of
contenders but also having replaced urllib as the go-to tool of the Python programmer who wants to speak HTTP.
That library is Requests, written by Kenneth Reitz and backed by the connection pooling logic of urllib3, which is
maintained by Andrey Petrov.
As you learn about HTTP in this chapter, you will return to both urllib and Requests to see what they do well,
and what they do poorly, when faced with each HTTP feature. Their basic interfaces are quite similar—they provide
a callable that opens an HTTP connection, makes a request, and waits for the response headers before returning a
response object that presents them to the programmer. The response body is left queued on the incoming socket and
read only when the programmer asks.
In most of the examples in this chapter, I will be testing the two HTTP client libraries against a small test web site
named http://httpbin.org, which was designed by Kenneth Reitz and which you can run locally by installing it with
pip and then running it inside a WSGI container (see Chapter 10) like Gunicorn. To run it on localhost port 8000