Foundations of Python Network Programming

(WallPaper) #1
Chapter 9 ■ http Clients

159

then a service needs to include a Vary header in every response listing the other headers on which document content
depends. Common choices are Host, Accept-Encoding, and especially Cookie if the designer is returning different
documents to different users.
Once the Vary header is set correctly, there are various levels of caching that can be activated.
Resources can be forbidden from being stored in a client cache at all, which forbids the client from making any
kind of automatic copy of the response on nonvolatile storage. The intention is to leave the user in control of whether
they select “save” to archive a copy of the resource to disk.


HTTP/1.1 200 OK
Cache-control: no-store
...


If the server opts instead to allow caching, then it will usually want to protect against the possibility that the client
might keep presenting the cached copy of the resource every time the user asks for it until it has become quite out-of-date.
The one case in which the server need not worry about whether a resource gets cached forever is when it is careful to
use a given path only for a single permanent version of a document or image. If a version number or hash at the end
of the URL is incremented or changed every time the designers come out with a new version of the corporate logo, for
example, then any given version of the logo can be delivered with permission to store it forever.
There are two ways that the server can prevent the client copy of the resource from being used forever. First, it can
specify an expiration date and time after which the resource cannot be reused without a request back to the server.


HTTP/1.1 200 OK
Expires: Thu, 01 Dec 1994 16:00:00 GMT
...


But the use of a date and time introduces the danger that an incorrectly set client clock will result in the cached
copy of the resource being used for far too long. A much better method is the modern mechanism of specifying the
number of seconds that the resource can be cached once it has been received, which will work as long as the client
clock is not simply stalled.


HTTP/1.1 200 OK
Cache-control: max-age=3600
...


The two headers shown here grant the client the unilateral ability, for a limited period of time, to keep using an
old copy of a resource without any consultation with the server.
But what if a server wants to retain a veto over whether a cached resource is used or a new version is fetched? In
that case, it will have to require the client to use an HTTP request to check back every time it wants to use the resource.
This will be more expensive than letting the client use the cached copy silently and without a network operation, but it
can still save time because the server will have to send a new copy of the resource if the only old copy possessed by the
client indeed proves to be out-of-date.
There are two mechanisms by which a server can make the client check back about every use of a resource but
let the client reuse its cached copy of the resource if possible. These are called conditional requests in the standard
because they will result in the transmission of a body only if the tests reveal the client cache to be out-of-date.
The first mechanism requires the server to know when resources were last modified. This can be easy to
determine if the resources are backed by, say, a file on the file system, but it can be difficult or impossible to determine
if the resources are pulled from a database table that does not feature an audit log or a date of last modification. If the
information is available, the server can include it in every response.

Free download pdf