Learning Python Network Programming

(Sean Pound) #1
Chapter 2

Requests with urllib


We have already seen some examples of HTTP exchanges while discussing the RFC
downloaders in Chapter 1, Network Programming and Python. The urllib package
is broken into several submodules for dealing with the different tasks that we may
need to perform when working with HTTP. For making requests and receiving
responses, we employ the urllib.request module.


Retrieving the contents of a URL is a straightforward process when done using
urllib. Load your Python interpreter and do the following:





from urllib.request import urlopen








response = urlopen('http://www.debian.org')








response





<http.client.HTTPResponse object at 0x7fa3c53059b0>





response.readline()





b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">\n'


We use the urllib.request.urlopen() function for sending a request and
receiving a response for the resource at http://www.debian.org, in this case an
HTML page. We will then print out the first line of the HTML we receive.


Response objects


Let's take a closer look at our response object. We can see from the preceding
example that urlopen() returns an http.client.HTTPResponse instance. The
response object gives us access to the data of the requested resource, and the
properties and the metadata of the response. To view the URL for the response
that we received in the previous section, do this:





response.url





'http://www.debian.org'


We get the data of the requested resource through a file-like interface using the
readline() and read() methods. We saw the readline() method in the previous
section. This is how we use the read() method:





response = urlopen('http://www.debian.org')








response.read(50)





b'g="en">\n\n <meta http-equiv="Content-Type" c'

Free download pdf