[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1
b'</PRE></P><BR>\n'
b'<HR>\n'

This book has much more to say later about HTML, CGI scripts, and the meaning of
the HTTP GET request used in Example 13-29 (along with POST, one of two way to
format information sent to an HTTP server), so we’ll skip additional details here.


Suffice it to say, though, that we could use the HTTP interfaces to write our own web
browsers and build scripts that use websites as though they were subroutines. By send-
ing parameters to remote programs and parsing their results, websites can take on the
role of simple in-process functions (albeit, much more slowly and indirectly).


The urllib Package Revisited


The http.client module we just met provides low-level control for HTTP clients.
When dealing with items available on the Web, though, it’s often easier to code down-
loads with Python’s standard urllib.request module, introduced in the FTP section
earlier in this chapter. Since this module is another way to talk HTTP, let’s expand on
its interfaces here.


Recall that given a URL, urllib.request either downloads the requested object over
the Net to a local file or gives us a file-like object from which we can read the requested
object’s contents. As a result, the script in Example 13-30 does the same work as the
http.client script we just wrote but requires noticeably less code.


Example 13-30. PP4E\Internet\Other\http-getfile-urllib1.py


"""
fetch a file from an HTTP (web) server over sockets via urllib; urllib supports
HTTP, FTP, files, and HTTPS via URL address strings; for HTTP, the URL can name
a file or trigger a remote CGI script; see also the urllib example in the FTP
section, and the CGI script invocation in a later chapter; files can be fetched
over the net with Python in many ways that vary in code and server requirements:
over sockets, FTP, HTTP, urllib, and CGI outputs; caveat: should run filename
through urllib.parse.quote to escape properly unless hardcoded--see later chapters;
"""


import sys
from urllib.request import urlopen
showlines = 6
try:
servername, filename = sys.argv[1:] # cmdline args?
except:
servername, filename = 'learning-python.com', '/index.html'


remoteaddr = 'http://%s%s' % (servername, filename) # can name a CGI script too
print(remoteaddr)
remotefile = urlopen(remoteaddr) # returns input file object
remotedata = remotefile.readlines() # read data directly here
remotefile.close()
for line in remotedata[:showlines]: print(line) # bytes with embedded \n


The urllib Package Revisited | 997
Free download pdf