Learning Python Network Programming

(Sean Pound) #1
Chapter 2

There are registered media types for many of the types of data that are transmitted
across the Internet, some common ones are:


Media type Description
text/html HTML document
text/plain Plain text document
image/jpeg JPG image
application/pdf PDF document
application/json JSON data
application/xhtml+xml XHTML document

Another media type of interest is application/octet-stream, which in practice is
used for files that don't have an applicable media type. An example of this would
be a pickled Python object. It is also used for files whose format is not known by
the server. In order to handle responses with this media type correctly, we need to
discover the format in some other way. Possible approaches are as follows:



  • Examine the filename extension of the downloaded resource, if it has one.
    The mimetypes module can then be used for determining the media type
    (go to Chapter 3, APIs in Action to see an example of this).

  • Download the data and then use a file type analysis tool. TheUse the
    Python standard library imghdr module can be used for images, and the
    third-party python-magic package, or the GNU file command, can be used
    for other types.

  • Check the website that we're downloading from to see if the file type has
    been documented anywhere.


Content type values can contain optional additional parameters that provide further
information about the type. This is usually used to supply the character set that the
data uses. For example:


Content-Type: text/html; charset=UTF-8.

In this case, we're being told that the character set of the document is UTF-8.
The parameter is included after a semicolon, and it always takes the form of a
key/value pair.


Let's discuss an example, downloading the Python home page and using the
Content-Type value it returns. First, we submit our request:





response = urlopen('http://www.python.org')




Free download pdf