[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

hyperlink tags; the file languages2.py, for instance, prints HTML that includes
a URL:


<a href="getfile.py?filename=cgi-bin\languages2.py">

Because the URL here is embedded in HTML, it must at least be escaped according to
HTML conventions (e.g., any < characters must become <), and any spaces should
be translated to + signs per URL conventions. A cgi.escape(url) call followed by the
string url.replace(" ", "+") would take us this far, and would probably suffice for
most cases.


That approach is not quite enough in general, though, because HTML escaping con-
ventions are not the same as URL conventions. To robustly escape URLs embedded in
HTML code, you should instead call urllib.parse.quote_plus on the URL string, or
at least most of its components, before adding it to the HTML text. The escaped result
also satisfies HTML escape conventions, because urllib.parse translates more char-
acters than cgi.escape, and the % in URL escapes is not special to HTML.


HTML and URL conflicts: &


But there is one more astonishingly subtle (and thankfully rare) wrinkle: you may also
have to be careful with & characters in URL strings that are embedded in HTML code
(e.g., within
hyperlink tags). The & symbol is both a query parameter separator in
URLs (?a=1&b=2) and the start of escape codes in HTML (<). Consequently, there is
a potential for collision if a query parameter name happens to be the same as an HTML
escape sequence code. The query parameter name amp, for instance, that shows up as
&=1 in parameters two and beyond on the URL may be treated as an HTML escape
by some HTML parsers, and translated to &=1.


Even if parts of the URL string are URL-escaped, when more than one parameter is
separated by a &, the & separator might also have to be escaped as & according to
HTML conventions. To see why, consider the following HTML hyperlink tag with
query parameter names name, job, amp, sect, and lt:


<A HREF="file.py?name=a&job=b&=c§=d<=e">hello</a>

When rendered in most browsers tested, including Internet Explorer on Windows 7,
this URL link winds up looking incorrectly like this (the S character in the first of these
is really a non-ASCII section marker):


file.py?name=a&job=b&=cS=d<=e result in IE
file.py?name=a&job=b&=c%A7=d%3C=e result in Chrome (0x3C is <)

The first two parameters are retained as expected (name=a, job=b), because name is not
preceded with an & and &job is not recognized as a valid HTML character escape code.
However, the &, §, and < parts are interpreted as special characters because
they do name valid HTML escape codes, even without a trailing semicolon.


To see this for yourself, open the example package’s test-escapes.html file in your
browser, and highlight or select its link; the query names may be taken as HTML


1206 | Chapter 15: Server-Side Scripting

← Previous
Free download pdf