[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

for any URL address. When the URL names a simple HTML file, we simply download
its contents. But when it names a CGI script, the effect is to run the remote script and
fetch its output. This notion opens the door to web services, which generate useful XML
in response to input parameters; in simpler roles, this allows us to test remote scripts.


For example, we can trigger the script in Example 15-8 directly, without either going
through the tutor3.html web page or typing a URL in a browser’s address field:


C:\...\PP4E\Internet\Web> python
>>> from urllib.request import urlopen
>>> reply = urlopen('http://localhost/cgi-bin/tutor3.py?user=Brian').read()
>>> reply
b'<TITLE>tutor3.py</TITLE>\n<H1>Greetings</H1>\n<HR>\n<P>Hello, Brian.</P>\n<HR>\n'

>>> print(reply.decode())
<TITLE>tutor3.py</TITLE>
<H1>Greetings</H1>
<HR>
<P>Hello, Brian.</P>
<HR>

>>> url = 'http://localhost/cgi-bin/tutor3.py'
>>> conn = urlopen(url)
>>> reply = conn.read()
>>> print(reply.decode())
<TITLE>tutor3.py</TITLE>
<H1>Greetings</H1>
<HR>
<P>Who are you?</P>
<HR>

Recall from Chapter 13 that urllib.request.urlopen gives us a file object connected to
the generated reply stream. Reading this file’s output returns the HTML that would
normally be intercepted by a web browser and rendered into a reply page. The reply
comes off of the underlying socket as bytes in 3.X, but can be decoded to str strings
as needed.


When fetched directly this way, the HTML reply can be parsed with Python text pro-
cessing tools, including string methods like split and find, the re pattern-matching
module, or the html.parser HTML parsing module—all tools we’ll explore in Chap-
ter 19. Extracting text from the reply like this is sometimes informally called screen
scraping—a way to use website content in other programs. Screen scraping is an alter-
native to more complex web services frameworks, though a brittle one: small changes
in the page’s format can often break scrapers that rely on it. The reply text can also be
simply inspected—urllib.request allows us to test CGI scripts from the Python in-
teractive prompt or other scripts, instead of a browser.


More generally, this technique allows us to use a server-side script as a sort of function
call. For instance, a client-side GUI can call the CGI script and parse the generated reply
page. Similarly, a CGI script that updates a database may be invoked programmatically
with urllib.request, outside the context of an input form page. This also opens the


1156 | Chapter 15: Server-Side Scripting

Free download pdf