Chapter 10 ■ http ServerS
17 2
Early descriptions of the Web seem to have imagined that forward proxies would be the most common proxying
pattern. An employer, for example, might provide an HTTP proxy that their employees’ web browsers request instead
of speaking to remote servers directly. A hundred employee web browsers asking for the Google logo first thing in the
morning might result in the proxy making but a single request to Google for the logo, which could then be cached
and used to satisfy all of the subsequent employee requests. If Google was generous enough with its Expires and
Cache-Control headers, then the employer would incur less bandwidth, and the employees would experience
a faster Web.
But with the emergence of TLS as a universal best practice to protect user privacy and credentials, forward
proxies become impossible. A proxy cannot inspect or cache a request that it cannot read.
Reverse proxies, on the other hand, are now ubiquitous among large HTTP services. A reverse proxy is operated as
part of a web service itself and is invisible to HTTP clients. When clients think they are connecting to python.org, they
are in fact speaking with a reverse proxy. The proxy can serve many resources, both static and dynamic, directly out of
its cache if the core python.org servers were careful to include Expires or Cache-Control headers. A reverse proxy can
often bear most of the load of running a service because HTTP requests need to be forwarded to the core servers only
if a resource is either uncacheable or has expired from the proxy’s cache.
A reverse proxy must necessarily perform TLS termination, and it must be the service that holds a certificate and
private key for the service it proxies. Unless a proxy can examine each incoming HTTP request, it cannot perform
either caching or forwarding.
If you adopt the use of a reverse proxy, either in the form of a front-end web server like Apache or nginx or with a
dedicated daemon like Varnish, then caching-related headers such as Expires and Cache-Control become even more
important than normal. Instead of being relevant only to the end user’s browser, they become crucial signals between
tiers of your own service architecture.
Reverse proxies can even help with data that you might think should not be cached, like a headline page or event
log that needs up-to-the-second accuracy, as long as you can tolerate the results being at least a few seconds old. After
all, it often takes clients a good fraction of a second to retrieve a resource anyway. Could it really hurt if the resource
is one extra second old? Imagine putting a one-second maximum age in the Cache-Control header of a critical feed
or event log that receives, say, a hundred requests per second. Your reverse proxy will go into action and, potentially,
reduce your server load by a hundred-fold: it will only need to fetch the resource once at the beginning of every
second, and then it can reuse that cached result for all of the other clients that ask.
If you will be designing and deploying a large HTTP service behind a proxy, you will want to consult RFC 7234
and its extended discussion of the design of HTTP caching and its intended benefits. You will find options and settings
that are specifically targeted at intermediary caches such as Varnish rather than at the end user’s HTTP client, like
proxy-revalidate and s-maxage, which you should have in your toolbox as you approach a service design.
■ Warning the content of a page often depends on not just its path and method but also on things such as the host
header, the identity of the user making the request, and perhaps the headers describing what content types their client
can support. review carefully the Vary header description in rFC 7231 section 7.1.4, as well as the description of the
vary header in Chapter 9. the value Vary: Cookie is, for reasons that will become clear, often necessary to ensure cor-
rect behavior.
Four Architectures
While architects seem capable of producing an unlimited number of complicated schemes for assembling an HTTP
service from smaller parts, there are four primary designs that have become established as habits in the Python
community (see Figure 10-1). What are your options for putting an HTTP service online if you have written Python
code to produce the dynamic content and have chosen an API or framework that can speak WSGI?
