Foundations of Python Network Programming

(WallPaper) #1

Chapter 9 ■ http Clients


154


The request is the block of text that begins with GET. The response begins with the version HTTP/1.1, and it
continues through the blank line below the headers to include the three lines of JSON text. Both the request and the
response are called an HTTP message in the standard, and each message is composed of three parts.


•    A first line that names a method and document in the request and names a return code and
description in the response. The line ends with a carriage return and linefeed (CR-LF, ASCII
codes 13 and 10).

•    Zero or more headers that consist of a name, a colon, and a value. Header names are
case-insensitive, so they can be capitalized however a client or server desires. Each header
ends with a CR-LF. A blank line then terminates the entire list of headers—the four bytes
CR-LF-CR-LF that form a pair of end-of-line sequences with nothing in between them.
This blank line is mandatory whether any headers appear above it or not.

•    An optional body that immediately follows the blank line that end the headers. There are
several options for framing the entity, as you will learn shortly.

The first line and the headers are each framed by their terminal CR-LF sequences, and the whole assembly is
framed as a unit by the blank line at the end, so the end can be discovered by a server or client by calling recv()
until the four-character sequence CR-LF-CR-LF appears. No prior warning is provided about how long the line and
headers might be, so many servers set commonsense maximums on their length to avoid running out of RAM when a
troublemaker connects and sends infinite-length headers.
There are three different options for framing a body, if one has been attached to the message.
The most common framing is the presence of a Content-Length header, whose value should be a decimal integer
giving the length of the body in bytes. This is simple enough to implement. The client can simply loop on a repeated
recv() call until the accumulated bytes finally equal the stated length. But declaring a Content-Length is sometimes
not feasible when data is being generated dynamically, and its length cannot be known until the process is complete.
A more complicated scheme is activated if the headers specify a Transfer-Encoding of “chunked.” Instead of the
body having its length specified up front, it is delivered in a series of smaller pieces that are each separately prefixed
by their length. Each chunk consists of at least a hexadecimal (in contrast to the Content-Length header, which
is decimal!) length field, the two characters CR-LF, a block of data of exactly the stated length, and again the two
characters CR-LF. The chunks end with a final chunk that declares that it has zero length—minimally, the digit zero, a
CR-LF, and then another CR-LF.
After the chunk length but before the CR-LF, the sender can insert a semicolon and then specify an “extension”
option that applies to that chunk. At the end, after the last chunk has given its length of zero and its CR-LF, the sender
can append a few last HTTP headers. You can refer to RFC 7230 for these details if you are implementing HTTP
yourself.
The other alternative to Content-Length is quite abrupt: the server can specify “Connection: close,” send as
much or as little body as it wants, and then close the TCP socket. This introduces the danger that the client cannot
tell whether the socket closed because the entire body was successfully delivered or whether the socket closed
prematurely because of a server or network error, and it also makes the protocol less efficient by forcing the client to
re-connect for every single request.
(The standard says that the “Connection: close” trick cannot be attempted by the client because then it could not
receive the server’s response. Had they not heard of the idea of a unidirectional shutdown() on the socket, allowing
the client to end its direction while still being able to read data back from the server?).


Methods


The first word of an HTTP request specifies the action that the client is requesting of the server. There are two
common methods, GET and POST, and a number of less common methods defined for servers that want to present a
full document API to other computer programs that may be accessing them (typically, JavaScript that they themselves
have delivered to a browser).

Free download pdf