Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


Content compression


The Accept-Encoding request header and the Content-Encoding response header
can work together to allow us to temporarily encode the body of a response for
transmission over the network. This is typically used for compressing the response
and reducing the amount of data that needs to be transferred.


This process follows these steps:



  • The client sends a request with acceptable encodings listed in an Accept-
    Encoding header

  • The server picks an encoding method that it supports

  • The server encodes the body using this encoding method

  • The server sends the response, specifying the encoding it has used in a
    Content-Encoding header

  • The client decodes the response body using the specified encoding method


Let's discuss how to request a document and get the server to use gzip compression
for the response body. First, let's construct the request:





req = Request('http://www.debian.org')





Next, add the Accept-Encoding header:





req.add_header('Accept-Encoding', 'gzip')





And then, submit it with the help of urlopen():





response = urlopen(req)





We can check if the server is using gzip compression by looking at the response's
Content-Encoding header:





response.getheader('Content-Encoding')





'gzip'


We can then decompress the body data by using the gzip module:





import gzip








content = gzip.decompress(response.read())








content.splitlines()[:5]





[b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">',


b'',

Free download pdf