Foundations of Python Network Programming

(WallPaper) #1
Chapter 9 ■ http Clients

163

All subsequent calls to methods like s.get() will use this default value for the header unless they override it with
a different value.
The urllib library offers its own patterns for setting up default handlers that can inject default headers, but, as
they are labyrinthine and, alas, object-oriented, I refer you to the documentation.


Content Type


Once a server has inspected the various Accepts headers from the client and decided which representation of a
resource to deliver, it sets the Content-Type header of the outgoing response accordingly.
Content types are selected from among the various MIME types that were already established for multimedia
that is transmitted as part of e-mail messages (see Chapter 12). The types text/plain and text/html are both
common along with image formats such as image/gif, image/jpg, and image/png. Documents can be delivered as
types including application/pdf. A plain sequence of bytes for which the server can guarantee no more specific
interpretation is given the content type of application/octet-stream.
There is one complication of which you should be aware when dealing with a Content-Type header delivered
over HTTP. If the major type (the word to the left of the slash) is text, then the server has a number of options about
how those text characters can be encoded for transmission to the client. It states its choice by appending to the
Content-Type header, a semicolon, and a declaration of the character encoding used to turn the text into bytes.


Content-Type: text/html; charset=utf-8


This means you cannot simply compare the Content-Type header to a list of MIME types without first checking
for the semicolon character and splitting it into two pieces. Most libraries will give you no help here. Whether you use
urllib or whether you use Requests, you will have to be responsible for splitting on the semicolon if you write code that
needs to inspect the content type (although Requests will at least use, if not tell you about, the content type’s charset
setting if you ask its Response object for its already-decoded text attribute).
The only library that you will examine in this book that allows the content type and character set to be
manipulated separately by default is Ian Bicking’s WebOb library (Chapter 10), whose Response objects offer separate
attributes called content_type and charset that get put together with a semicolon in the Content-Type header per
the standard.


HTTP Authentication


Just as the word authentic denotes something that is genuine, real, actual, or true, authentication describes any
procedures for determining whether a request really comes from someone authorized to make it. Just as your
telephone conversation with a bank or airline will be prefixed with questions about your address and personal
identity in order to establish that it is really the account holder calling, so too an HTTP request often needs to carry
built-in proof as to the identity of the machine or person making it.
The error code 401 Not Authorized is used by servers that want to signal formally, through the protocol itself,
either that they cannot authenticate your identity or that the identity is fine but is not one authorized to view this
particular resource.
Many real-world HTTP servers never actually deign to return a 401 because they are designed purely for human
users. On these servers, an attempt to fetch a resource without the proper identification is likely to return a 303 See
Other to their login page. This is helpful for a human but far less so for your Python program, which will have to learn
distinguish between a 303 See Other that truly indicates a failure to authenticate from an innocent redirection that is
really just trying to take you to the resource.
Because every HTTP request is stand-alone and independent of all other requests, even those that come right
before and after it on the same socket, any authenticating information much be carried separately in every single
request. This independence is what makes it safe for proxy servers and load balancers to distribute HTTP requests,
even requests that arrive over the same socket, among as many servers as they want.

Free download pdf