Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


The read() method returns the specified number of bytes from the data. Here it's the
first 50 bytes. A call to the read() method with no argument will return all the data
in one go.


The file-like interface is limited. Once the data has been read, it's not possible to go
back and re-read it by using either of the aforementioned functions. To demonstrate
this, try doing the following:





response = urlopen('http://www.debian.org')








response.read()





b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">\n<html
lang="en">\n\n <meta http-equiv





response.read()





b''


We can see that when we call the read() function a second time it returns an empty
string. There are no seek() or rewind() methods, so we cannot reset the position.
Hence, it's best to capture the read() output in a variable.


Both readline() and read() functions return bytes objects, and neither http nor
urllib will make any effort to decode the data that they receive to Unicode. Later on
in the chapter, we'll be looking at a way in which we can handle this with the help of
the Requests library.


Status codes


What if we wanted to know whether anything unexpected had happened to our
request? Or what if we wanted to know whether our response contained any data
before we read the data out? Maybe we're expecting a large response, and we want
to quickly see if our request has been successful without reading the whole response.


HTTP responses provide a means for us to do this through status codes. We can read
the status code of a response by using its status attribute.





response.status





200


Status codes are integers that tell us how the request went. The 200 code informs us
that everything went fine.

Free download pdf