[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

HTTP: Accessing Websites


Python’s standard library (the modules that are installed with the interpreter) also in-
cludes client-side support for HTTP—the Hypertext Transfer Protocol—a message
structure and port standard used to transfer information on the World Wide Web. In
short, this is the protocol that your web browser (e.g., Internet Explorer, Firefox,
Chrome, or Safari) uses to fetch web pages and run applications on remote servers as
you surf the Web. Essentially, it’s just bytes sent over port 80.


To really understand HTTP-style transfers, you need to know some of the server-side
scripting topics covered in Chapter 15 (e.g., script invocations and Internet address
schemes), so this section may be less useful to readers with no such background.
Luckily, though, the basic HTTP interfaces in Python are simple enough for a cursory
understanding even at this point in the book, so let’s take a brief look here.


Python’s standard http.client module automates much of the protocol defined by
HTTP and allows scripts to fetch web pages as clients much like web browsers; as we’ll
see in Chapter 15, http.server also allows us to implement web servers to handle the
other side of the dialog. For instance, the script in Example 13-29 can be used to grab
any file from any server machine running an HTTP web server program. As usual, the
file (and descriptive header lines) is ultimately transferred as formatted messages over
a standard socket port, but most of the complexity is hidden by the http.client module
(see our raw socket dialog with a port 80 HTTP server in Chapter 12 for a comparison).


Example 13-29. PP4E\Internet\Other\http-getfile.py


"""
fetch a file from an HTTP (web) server over sockets via http.client; the filename
parameter may have a full directory path, and may name a CGI script with? query
parameters on the end to invoke a remote program; fetched file data or remote
program output could be saved to a local file to mimic FTP, or parsed with str.find
or html.parser module; also: http.client request(method, url, body=None, hdrs={});
"""


import sys, http.client
showlines = 6
try:
servername, filename = sys.argv[1:] # cmdline args?
except:
servername, filename = 'learning-python.com', '/index.html'


print(servername, filename)
server = http.client.HTTPConnection(servername) # connect to http site/server
server.putrequest('GET', filename) # send request and headers
server.putheader('Accept', 'text/html') # POST requests work here too
server.endheaders() # as do CGI script filenames


reply = server.getresponse() # read reply headers + data
if reply.status != 200: # 200 means success
print('Error sending request', reply.status, reply.reason)
else:


994 | Chapter 13: Client-Side Scripting

Free download pdf