HTTP and Working with the Web
Content compression
The Accept-Encoding request header and the Content-Encoding response header
can work together to allow us to temporarily encode the body of a response for
transmission over the network. This is typically used for compressing the response
and reducing the amount of data that needs to be transferred.
This process follows these steps:
- The client sends a request with acceptable encodings listed in an Accept-
Encoding header - The server picks an encoding method that it supports
- The server encodes the body using this encoding method
- The server sends the response, specifying the encoding it has used in a
Content-Encoding header - The client decodes the response body using the specified encoding method
Let's discuss how to request a document and get the server to use gzip compression
for the response body. First, let's construct the request:
req = Request('http://www.debian.org')
Next, add the Accept-Encoding header:
req.add_header('Accept-Encoding', 'gzip')
And then, submit it with the help of urlopen():
response = urlopen(req)
We can check if the server is using gzip compression by looking at the response's
Content-Encoding header:
response.getheader('Content-Encoding')
'gzip'
We can then decompress the body data by using the gzip module:
import gzip
content = gzip.decompress(response.read())
content.splitlines()[:5]
[b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">',
b'',