[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

Other urllib Interfaces


One last mutation: the following urllib.request downloader script uses the slightly
higher-level urlretrieve interface in that module to automatically save the downloaded
file or script output to a local file on the client machine. This interface is handy if we
really mean to store the fetched data (e.g., to mimic the FTP protocol). If we plan on
processing the downloaded data immediately, though, this form may be less convenient
than the version we just met: we need to open and read the saved file. Moreover, we
need to provide an extra protocol for specifying or extracting a local filename, as in
Example 13-31.


Example 13-31. PP4E\Internet\Other\http-getfile-urllib2.py


"""
fetch a file from an HTTP (web) server over sockets via urlllib; this version
uses an interface that saves the fetched data to a local binary-mode file; the
local filename is either passed in as a cmdline arg or stripped from the URL with
urllib.parse: the filename argument may have a directory path at the front and query
parameters at end, so os.path.split is not enough (only splits off directory path);
caveat: should urllib.parse.quote filename unless known ok--see later chapters;
"""


import sys, os, urllib.request, urllib.parse
showlines = 6
try:
servername, filename = sys.argv[1:3] # first 2 cmdline args?
except:
servername, filename = 'learning-python.com', '/index.html'


remoteaddr = 'http://%s%s' % (servername, filename) # any address on the Net
if len(sys.argv) == 4: # get result filename
localname = sys.argv[3]
else:
(scheme, server, path, parms, query, frag) = urllib.parse.urlparse(remoteaddr)
localname = os.path.split(path)[1]


print(remoteaddr, localname)
urllib.request.urlretrieve(remoteaddr, localname) # can be file or script
remotedata = open(localname, 'rb').readlines() # saved to local file
for line in remotedata[:showlines]: print(line) # file is bytes/binary


Let’s run this last variant from a command line. Its basic operation is the same as the
last two versions: like the prior one, it builds a URL, and like both of the last two, we
can list an explicit target server and file path on the command line:


C:\...\PP4E\Internet\Other> http-getfile-urllib2.py
http://learning-python.com/index.html index.html
b'<HTML>\n'
b' \n'
b'<HEAD>\n'
b"<TITLE>Mark Lutz's Python Training Services</TITLE>\n"
b'<!--mstheme--><link rel="stylesheet" type="text/css" href="_themes/blends/blen...'
b'</HEAD>\n'

The urllib Package Revisited | 999
Free download pdf