Chapter 9 ■ http Clients
162
The types and languages listed first have the strongest preference value of 1.0, while the ones that are listed later
in the header are often demoted to q=0.9 or q=0.8 to make sure the server knows that they are not preferred over the
best choices.
Many simple HTTP services and sites ignore these headers entirely and instead fall back to using a separate
URL for each version of a resource they possess. A site’s front page, for example, might exist in the two versions
/en/index.html and /fr/index.html if the site supports both English and French. The same corporate logo might
be located at both of the paths /logo.png and /logo.gif, and the user might be offered both for download when
browsing the corporation’s press kit. The documentation for a RESTful web service (see Chapter 10) will often specify
that different URL query parameters, like ?f=json and ?f=xml, be used to select the representation that is returned.
But that is not how HTTP was designed to work.
The intention of HTTP was that a resource should have one path at which it lives, regardless of how many
different machine formats—or human languages—might be used to render it, and that the server use those content
negotiation headers to select that resource.
Why is content negotiation often ignored?
First, the use of content negotiation can leave the user with little control over their user experience. Imagine
again a site that offers its pages in both English and French. If it displays a language based on the Accept-Language
header and the user wants to see the other language, the server has no control over the situation—it would have to
suggest to the user that they bring up the control panel for their web browser and change their default language. What
if the user cannot find that setting? What if they are browsing from a public terminal and do not have permission to set
preferences in the first place?
Instead of turning control of language selection over to a browser that might not be well written, coherent, or
easily configurable, many sites simply build several redundant sets of paths, one for each human language that they
want to support. They might, when the user first arrives, examine the Accept-Language header in order to autodirect
the browser to the language most likely to be appropriate. But they want the user to be able to browse back in the
other direction if the selection was inappropriate.
Second, content negotiation is often ignored (or sits alongside a URL-based mechanism for forcing the return
of the correct version of the content) because HTTP client APIs (whether the API is used by JavaScript in a browser
or the API is offered by other languages in their own runtimes) often make it difficult to control the Accepts headers.
The pleasant thing about placing control elements into the path inside the URL is that anyone using even the most
primitive tool for fetching a URL will be able to twiddle the knob by adjusting the URL.
Finally, content negotiation means that HTTP servers have to generate or select content by making choices
among several axes. You might assume that server logic can always access the Accepts headers, which, alas, is not
always the case. Programming on the server side is often easier if content negotiation is left off the table.
But for sophisticated services that want to support it, content negotiation can help prune the possible space of
URLs while still offering a mechanism by which an intelligent HTTP client can get content that has been rendered
with its data formatting or human reader’s needs in mind. If you plan on using it, consult RFC 7231 for the details of
the various Accept headers’ syntax.
One final annoyance is the User-Agent string.
The User-Agent was not supposed to be part of content negotiation at all, but to serve only as an emergency
stop-gap for working around the limitations of particular browsers. It was, in other words, a mechanism for targeting
carefully designed fixes at specific clients while letting any other clients through to the page without any problem.
But the developers of applications backed by customer call centers quickly discovered that they could make
compatibility problems impossible and reduce the number of support calls up front by forbidding any browser except,
say, a single version of Internet Explorer from accessing their site. The arms race that ensued between clients and
browsers resulted in the very long User-Agent strings you have today, as recounted somewhat fancifully at
http://webaim.org/blog/user-agent-string-history/.
Both of the client libraries you are exploring, urllib and Requests, allow you to put any Accept headers into
your request that you please. They also both support patterns for creating a client that will use your favorite headers
automatically. Requests builds this feature right into its idea of a Session.
s = requests.Session()
s.headers.update({'Accept-Language': 'en-US,en;q=0.8'})