[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

data = reply.readlines() # file obj for data received
reply.close() # show lines with eoln at end
for line in data[:showlines]: # to save, write data to file
print(line) # line already has \n, but bytes


Desired server names and filenames can be passed on the command line to override
hardcoded defaults in the script. You need to know something of the HTTP protocol
to make the most sense of this code, but it’s fairly straightforward to decipher. When
run on the client, this script makes an HTTP object to connect to the server, sends it a
GET request along with acceptable reply types, and then reads the server’s reply. Much
like raw email message text, the HTTP server’s reply usually begins with a set of
descriptive header lines, followed by the contents of the requested file. The HTTP
object’s getfile method gives us a file object from which we can read the downloaded
data.


Let’s fetch a few files with this script. Like all Python client-side scripts, this one works
on any machine with Python and an Internet connection (here it runs on a Windows
client). Assuming that all goes well, the first few lines of the downloaded file are printed;
in a more realistic application, the text we fetch would probably be saved to a local file,
parsed with Python’s html.parser module (introduced in Chapter 19), and so on.
Without arguments, the script simply fetches the HTML index page at http://learning
-python.com, a domain name I host at a commercial service provider:


C:\...\PP4E\Internet\Other> http-getfile.py
learning-python.com /index.html
b'<HTML>\n'
b' \n'
b'<HEAD>\n'
b"<TITLE>Mark Lutz's Python Training Services</TITLE>\n"
b'<!--mstheme--><link rel="stylesheet" type="text/css" href="_themes/blends/blen...'
b'</HEAD>\n'

Notice that in Python 3.X the fetched data comes back as bytes strings again, not str;
since the Python html.parser HTML parse we’ll meet in Chapter 19 expects str text
strings instead of bytes, you’ll likely need to resolve a Unicode encoding choice here
in order to parse, much the same as we did for email message text earlier in this chapter.
As there, we might decode from bytes to str per a default, user preferences or selections,
headers inspection, or byte structure analysis. Because sockets send raw bytes, we con-
front this choice point whenever data shipped over them is text in nature; unless that
text’s type is known or always simple in form, Unicode implies extra steps.


We can also list a server and file to be fetched on the command line, if we want to be
more specific. In the following code, we use the script to fetch files from two different
websites by listing their names on the command lines (I’ve truncated some of these
lines so they fit in this book). Notice that the filename argument can include an arbitrary
remote directory path to the desired file, as in the last fetch here:


C:\...\PP4E\Internet\Other> http-getfile.py http://www.python.org /index.html
http://www.python.org /index.html
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3....'

HTTP: Accessing Websites | 995
Free download pdf