Learning Python Network Programming

(Sean Pound) #1
Chapter 1

The request string that we create to send to the server is also much more complicated
than the URL that we used before: it's a full HTTP request. In the next chapter, we'll
be looking at these in detail.


Next, we deal with the network communication over the TCP connection.
We send the entire request string to the server using the sendall() call.
The data sent through TCP must be in raw bytes, so we have to encode the
request text as ASCII before sending it.


Then, we piece together the server's response as it arrives in the while loop.
Bytes that are sent to us through a TCP socket are presented to our application in
a continuous stream. So, like any stream of unknown length, we have to read it
iteratively. The recv() call will return the empty string after the server sends all its
data and closes the connection. Hence, we can use this as a condition for breaking
out and printing the response.


Our program is clearly more complicated. Compared to our previous one,
this is not good in terms of maintenance. Also, if you run the program and look
at the start of the output RFC text, then you'll notice that there are some extra
lines at the beginning, and these are as follows:


HTTP/1.1 200 OK
Date: Thu, 07 Aug 2014 15:47:13 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d1983ad4f7...
Last-Modified: Fri, 27 Mar 1998 22:45:31 GMT
ETag: W/"8982977-4c9a-32a651f0ad8c0"

Because we're now dealing with a raw HTTP protocol exchange, we're seeing the
extra header data that HTTP includes in a response. This has a similar purpose to
the lower-level packet headers. The HTTP header contains HTTP-specific metadata
about the response that tells the client how to interpret it. Before, urllib parsed this
for us, added the data as attributes to the response object, and removed the header
data from the output data. We would need to add code to do this as well to make
this program as capable as our first one.


What can't immediately be seen from the code is that we're also missing out on the
urllib module's error checking and handling. Although low-level network errors
will still generate exceptions, we will no longer catch any problems in the HTTP
layer, which urllib would have done.

Free download pdf