Foundations of Python Network Programming

(WallPaper) #1
Chapter 9 ■ http Clients

155

The two basic methods, GET and POST, provide the basic “read” and “write” operations of HTTP.
GET is the method performed when you type an HTTP URL into your web browser: it asks for the document
named by the request path to be transmitted as the server’s response. It cannot include a body. The standard insists
that servers cannot, under any circumstances, let clients modify data with this method. Any parameters attached to
the path (see Chapter 11 to learn about URLs) can only modify the document that is being returned, as in ?q=python
or ?results=10, not ask that changes take place on the server. The restriction that GET cannot modify data lets a client
safely re-attempt a GET if a first attempt is interrupted, allows GET responses to be cached (you learn about caching
later in this chapter), and makes it safe for web scraping programs (see Chapter 11) to visit as many URLs as they want
without fearing that they are creating or deleting content on the sites they are traversing.
POST is used when the client wants to submit new data to the server. Traditional web forms, if they do not simply
copy your form fields into the URL, usually use POST to deliver your request. Programmer-oriented APIs also use POST
for submitting new documents, comments, and database rows. Because running the same POST twice might perform
an action on the server twice, like giving a merchant a second $100 payment, the results of a POST neither can be
cached to satisfy future repeats of the POST nor can a POST be retried automatically if the response does not arrive.
The remaining HTTP methods can be categorized as being basically like GET or basically like POST.
The methods like GET are OPTIONS and HEAD. The OPTIONS method asks what header values will work with
a particular path, and the HEAD method asks the server to go through the process of getting ready to transmit the
resource but then to stop and transmit only the headers instead. This lets a client check on things such as Content-
Type without incurring the cost of downloading the body.
The operations like POST are PUT and DELETE, in that they are expected to perform what might be irreversible
changes to the content stored by the server. As you would expect from their names, PUT is intended to deliver a new
document that will henceforth live at the path that the request specifies, and DELETE asks the server to destroy the
path and any content associated with it. Interestingly, these two methods—while requesting “writes” of the server
content—are safe in a way that POST is not: they are idempotent and can be retried as many times as the client wants
because the effect of running either of them once ought to be the same as the effect of running them many times.
Finally, the standard specifies both a debugging method TRACE and a method CONNECT for switching protocols
to something besides HTTP (which, as you will see in Chapter 11, is used to turn on WebSockets). They are, however,
rarely used, and in neither case have they anything to do with the delivery of documents that is the core duty of HTTP,
which you are learning about in this chapter. Refer to the standard for more information about them.
Note that one quirk of the Standard Library’s urlopen() is that it chooses its HTTP verb invisibly: POST if the
caller specifies a data parameter, or GET otherwise. This is an unfortunate choice because the correct use of HTTP
verbs is crucial to safe client and server design. The Requests choice of get() and post() is much better for these
essentially different methods.


Paths and Hosts


The first versions of HTTP allowed the request to consist solely of a verb and path.


GET /html/rfc7230


This worked well in the early era when every server hosted exactly one web site, but it broke down as soon as
administrators wanted to be able to deploy large HTTP servers that could serve dozens or hundreds of sites. Given
only a path, how could the server guess which hostname the user had typed in the URL—especially for a path
like / that typically exists on every web site?
The solution was to make at least one header, the Host header, mandatory. Modern versions of the protocol also
include the protocol version in a minimally correct request, which would read as follows:


GET /html/rfc7230 HTTP/1.1
Host: tools.ietf.org

Free download pdf