Foundations of Python Network Programming

(WallPaper) #1

Chapter 9 ■ http Clients


152


so that you can try the examples in this chapter on your own machine without needing to hit the public version of
httpbin.org, simply type the following:


$ pip install gunicorn httpbin requests
$ gunicorn httpbin:app


You should then be able to fetch one of its pages with both urllib and Requests to see how their interfaces, at first
glance, are similar.





import requests
r = requests.get('http://localhost:8000/headers')
print(r.text)
{
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Host": "localhost:8000",
"User-Agent": "python-requests/2.3.0 CPython/3.4.1 Linux/3.13.0-34-generic"
}
}
from urllib.request import urlopen
import urllib.error
r = urlopen('http://localhost:8000/headers')
print(r.read().decode('ascii'))
{
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Host": "localhost:8000",
"User-Agent": "Python-urllib/3.4"
}
}





Two differences are already visible, and they are a good foreshadowing of what is to come in this chapter. Requests
has declared up front that it supports gzip- and deflate-compressed HTTP responses, while urllib knows nothing about
them. Furthermore, while Requests has been able to determine the correct decoding to turn this HTTP response from
raw bytes into text, the urllib library has simply returned bytes and made you perform the decoding yourself.
There have been other attempts at powerful Python HTTP clients, many of them focused on trying to be more
browser-like. These wanted to go beyond the HTTP protocol described in this chapter and launch into concepts that
you will learn about in Chapter 11, bringing together the structure of HTML, the semantics of its forms, and the rules
of what a browser is supposed to do when you have completed a form and click Submit. The library mechanize, for
example, enjoyed a period of popularity.
In the end, however, web sites are often too sophisticated to interact with anything less than a full browser, as
forms are often valid today only because of annotations or adjustments made by JavaScript. Many modern forms do
not even have a real Submit button but activate a script to do their work. Technologies for controlling browsers have
proved more useful than mechanize, and I cover some of them in Chapter 11.
The goal of this chapter is for you to understand HTTP, to see how many of its features are accessible through
Requests and urllib, and to help you understand the boundaries in which you will operate if instead you use the urllib
package built in to the Standard Library. If you do ever find yourself in a situation where you cannot install third-party

Free download pdf