[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

The urllib module package gives us a file-like interface to the server’s reply for a URL.
Notice that the output we read from the server is raw HTML code (normally rendered
by a browser). We can process this text with any of Python’s text-processing tools,
including:

String methods to search and split

The re regular expression pattern-matching module

Full-blown HTML and XML parsing support in the standard library, including
html.parser, as well as SAX-, DOM-, and ElementTree–style XML parsing tools.

When combined with such tools, the urllib package is a natural for a variety of
techniques—ad-hoc interactive testing of websites, custom client-side GUIs, “screen
scraping” of web page content, and automated regression testing systems for remote
server-side CGI scripts.

Formatting Reply Text

One last fine point: because CGI scripts use text to communicate with clients, they
need to format their replies according to a set of rules. For instance, notice how Ex-
ample 1-31 adds a blank line between the reply’s header and its HTML by printing an
explicit newline (\n) in addition to the one print adds automatically; this is a required
separator.

Also note how the text inserted into the HTML reply is run through the cgi.escape
(a.k.a. html.escape in Python 3.2; see the note under “Python HTML and URL Escape
Tools” on page 1203) call, just in case the input includes a character that is special in
HTML. For example, Figure 1-13 shows the reply we receive for form input Bob
Smith—the in the middle becomes </i> in the reply, and so doesn’t interfere
with real HTML code (use your browser’s view source option to see this for yourself);
if not escaped, the rest of the name would not be italicized.

Figure 1-13. Escaping HTML characters

Step 6: Adding a Web Interface | 59

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Formatting Reply Text

Get our desktop app

Company

Features

Documentation

Resources