Foundations of Python Network Programming

(WallPaper) #1

Chapter 9 ■ http Clients


160


HTTP/1.1 200 OK
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
...


A client that wants to reuse a cached copy of the resource can also cache this date and then repeat it back to the
server the next time it needs to use the resource. If the server sees that the resource has not been modified since the
client last received it, then the server can opt out of transmitting a body by instead simply transmitting headers and
the special status code 304.


GET / HTTP/1.1
If-Modified-Since: Tue, 15 Nov 1994 12:45:26 GMT
...


HTTP/1.1 304 Not Modified
...


The second mechanism deals with resource identity instead of modify time. The server in this case needs some
way to create a unique tag for every version of a resource that is guaranteed to change to a new unique value every
time the resource changes—checksums or database UUIDs are possible sources of such information. The server,
whenever it builds a reply, will need to deliver the tag in an ETag header.


HTTP/1.1 200 OK
ETag: "d41d8cd98f00b204e9800998ecf8427e"
...


The client that has cached and possesses this version of the resource can, when it wants to reuse the copy again
to satisfy a user action, make a request for the resource to the server and include the cached tag in case it still names
the current version of the resource.


GET / HTTP/1.1
If-None-Match: "d41d8cd98f00b204e9800998ecf8427e"
...


HTTP/1.1 304 Not Modified
...


The quotation marks used in ETag and If-None-Match reflect the fact that the scheme can actually do more powerful
comparisons than simply to compare the two strings for equality. Consult RFC 7232 Section 3.2 if you want the details.
Note again that both If-Modified-Since and If-None-Match save bandwidth only by preventing the resource from
being transmitted again and thus also the time spent in transmission. They still incur at least a round-trip to the server
and back before the client can proceed to use the resource.
Caching is powerful and crucial to the performance of the modern Web. However, neither of the client libraries
for Python that you are looking at will perform caching by default. Both urllib and Requests believe that their job is to
perform a real live network HTTP request when the time comes that you need one—not to manage a cache that might
exempt you from needing to talk over the network in the first place. You will have to seek out third-party libraries if you
want a wrapper that when pointed at some form of local persistent storage that you can provide, uses Expires and
Cache-control headers, modify dates, and ETags to try to minimize the latency and network traffic that your client incurs.

Free download pdf