Foundations of Python Network Programming

(WallPaper) #1
Chapter 9 ■ http Clients

161

Caching is also important to think about if you are configuring or running a proxy, a topic that I will discuss in
Chapter 10.


Content Encoding


It is crucial to understand the difference between an HTTP transfer encoding and content encoding.
A transfer encoding is simply a scheme for turning a resource into an HTTP response body. By definition, the
choice of transfer encoding makes no difference in the end. As an example, the client ought to find that same document
or image has been delivered whether the response was framed with either a Content-Length or a chunked encoding.
The resource should look the same whether the bytes were sent raw or compressed to make transmission faster.
A transfer encoding is simply a wrapper used for data delivery, not a change in the underlying data itself.
Though modern web browsers support several transfer encodings, the most popular with programmers is probably
gzip. A client able to accept this transfer encoding must declare so in an Accept-Encoding header and be prepared to
examine the Transfer-Encoding header of the response to determine whether the server took it up on its offer.


GET / HTTP/1.1
Accept-Encoding: gzip
...


HTTP/1.1 200 OK
Content-Length: 3913
Transfer-Encoding: gzip
...


The urllib library has no support for this mechanism, and so it requires your own code to produce and detect
these headers and then to uncompress the response body yourself if you want to take advantage of compressed
transfer encodings.
The Requests library automatically declares an Accept-Encoding of gzip,deflate, and it uncompresses the body
automatically if the server responds with an appropriate Transfer-Encoding. This makes compression both automatic
when servers support it and invisible to the user of Requests.


Content Negotiation


Content type and content encoding, in contrast to transfer encoding, are entirely visible to the end user or client
program that is performing an HTTP request. They determine both what file format will be selected to represent a
given resource and—if the format is text—what encoding will be used to turn text code points into bytes.
These headers allow an old browser that cannot display new-fangled PNG images to indicate that it prefers GIF and
JPG instead, and they allow resources to be delivered in a language that the user has indicated to their web browser that
they prefer. Here is a sample of what such headers might look like when generated by a modern web browser:


GET / HTTP/1.1
Accept: text/html;q=0.9,text/plain,image/jpg,/;q=0.8
Accept-Charset: unicode-1-1;q=0.8
Accept-Language: en-US,en;q=0.8,ru;q=0.6
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML)
...

Free download pdf