Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


There is a caveat for path encoding. If the elements of a path themselves contain
slashes, then we may run into problems. The example is shown in the following
commands:





username = '+Zoot/Dingo+'








path = 'images/users/{}'.format(username)








quote(path)





'images/user/%2BZoot/Dingo%2B'


Notice how the slash in the username doesn't get escaped? This will be incorrectly
interpreted as an extra level of directory structure, which is not what we want. In
order to get around this, first we need to individually escape any path elements that
may contain slashes, and then join them manually:





username = '+Zoot/Dingo+'








user_encoded = quote(username, safe='')








path = '/'.join(('', 'images', 'users', username))





'/images/users/%2BZoot%2FDingo%2B'


Notice how the username slash is now percent-encoded? We encode the username
separately, telling quote not to ignore slashes by supplying the safe='' argument,
which overwrites its default ignore list of /. Then, we combine the path elements by
using a simple join() function.


Here, it's worth mentioning that hostnames sent over the wire must be strictly
ASCII, however the socket and http modules support transparent encoding of
Unicode hostnames to an ASCII-compatible encoding, so in practice we don't need
to worry about encoding hostnames. There are more details about this process in the
encodings.idna section of the codecs module documentation.


URLs in summary


There are quite a few functions that we've used in the preceding sections. Let's just
review what we have used each function for. All of these functions can be found in
the urllib.parse module. They are as follows:



  • Splitting a URL into its components: urlparse

  • Combining an absolute URL with a relative URL: urljoin

  • Parsing a query string into a dict: parse_qs

  • URL-encoding a path: quote

  • Creating a URL-encoded query string from a dict: urlencode

  • Creating a URL from components (reverse of urlparse): urlunparse

Free download pdf