[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

escapes. This text appears to parse correctly in Python’s own HTML parser module
described earlier (unless the parts in question also end in a semicolon); that might help
for replies fetched manually with urllib.request, but not when rendered in browsers:


>>> from html.parser import HTMLParser
>>> html = open('test-escapes.html').read()
>>> HTMLParser().unescape(html)
'<HTML>\n<A HREF="file.py?name=a&job=b&=c§=d<=e">hello</a>\n</HTML>'

Avoiding conflicts


What to do then? To make this work as expected in all cases, the & separators should
generally be escaped if your parameter names may clash with an HTML escape code:


<A HREF="file.py?name=a&job=b&amp=c&sect=d&lt=e">hello</a>

Browsers render this fully escaped link as expected (open test-escapes2.html to test),
and Python’s HTML parser does the right thing as well:


file.py?name=a&job=b&=c§=d<=e result in both IE and Chrome

>>> h = '<A HREF="file.py?name=a&job=b&amp=c&sect=d&lt=e">hello</a>'
>>> HTMLParser().unescape(h)
'<A HREF="file.py?name=a&job=b&=c§=d<=e">hello</a>'

Because of this conflict between HTML and URL syntax, most server tools (including
Python’s urlib.parse query-parameter parsing tools employed by Python’s cgi mod-
ule) also allow a semicolon to be used as a separator instead of &. The following link,
for example, works the same as the fully escaped URL, but does not require an extra
HTML escaping step (at least not for the ;):


file.py?name=a;job=b;amp=c;sect=d;lt=e

Python’s html.parser unescape tool allows the semicolons to pass unchanged, too,
simply because they are not significant in HTML code. To fully test all three of these
link forms for yourself at once, place them in an HTML file, open the file in your browser
using its http://localhost/badlink.html URL, and view the links when followed. The
HTML file in Example 15-24 will suffice.


Example 15-24. PP4E\Internet\Web\badlink.html



"cgi-bin/badlink.py?name=a&job=b&=c§=d<=e">unescaped

"cgi-bin/badlink.py?name=a&job=b&=c§=d<=e">escaped

"cgi-bin/badlink.py?name=a;job=b;amp=c;sect=d;lt=e">alternative

More on HTML and URL Escapes | 1207
Free download pdf