[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

translated to such escape sequences, and spaces are replaced by + signs. Technically,
this convention is known as the application/x-www-form-urlencoded query string for-
mat, and it’s part of the magic behind those bizarre URLs you often see at the top of
your browser as you surf the Web.


Python HTML and URL Escape Tools


If you’re like me, you probably don’t have the hexadecimal value of the ASCII code for
& committed to memory (though Python’s hex(ord(c)) can help). Luckily, Python pro-
vides tools that automatically implement URL escapes, just as cgi.escape does for
HTML escapes. The main thing to keep in mind is that HTML code and URL strings
are written with entirely different syntax, and so employ distinct escaping conventions.
Web users don’t generally care, unless they need to type complex URLs explicitly—
browsers handle most escape code details internally. But if you write scripts that must
generate HTML or URLs, you need to be careful to escape characters that are reserved
in either syntax.


Because HTML and URLs have different syntaxes, Python provides two distinct sets
of tools for escaping their text. In the standard Python library:



  • cgi.escape escapes text to be embedded in HTML.

  • urllib.parse.quote and quote_plus escape text to be embedded in URLs.


The urllib.parse module also has tools for undoing URL escapes (unquote,
unquote_plus), but HTML escapes are undone during HTML parsing at large (e.g., by
Python’s html.parser module). To illustrate the two escape conventions and tools, let’s
apply each tool set to a few simple examples.


Somewhat inexplicably, Python 3.2 developers have opted to move and
rename the cgi.escape function used throughout this book to
html.escape, to make use of its longstanding original name deprecated,
and to alter its quoting behavior slightly. This is despite the fact that this
function has been around for ages and is used in almost every Python
CGI-based web script: a glaring case of a small group’s notion of aes-
thetics trouncing widespread practice in 3.X and breaking working code
in the process. You may need to use the new html.escape name in a
future Python version; that is, unless Python users complain loudly
enough (yes, hint!).

Escaping HTML Code


As we saw earlier, cgi.escape translates code for inclusion within HTML. We normally
call this utility from a CGI script, but it’s just as easy to explore its behavior interactively:


>>> import cgi
>>> cgi.escape('a < b > c & d "spam"', 1)

More on HTML and URL Escapes | 1203
Free download pdf