Foundations of Python Network Programming

(WallPaper) #1

Chapter 9 ■ http Clients


166


More subtly, a login page that is not a true web form but that uses Ajax to stay on the same page (see Chapter 11)
can still enjoy the benefit of cookies if the API lives at the same hostname. When the API call to do the login confirms
the username and password and returns 200 OK along with a Cookie header, it is empowering all subsequent requests
to the same site—not just API calls but requests for pages, images, and data—to supply the cookie and be recognized
as coming from an authenticated user.
Note that cookies should be designed to be opaque. They should be either random UUID strings that lead the
server to a database record giving the real username or encrypted strings that the server alone can decrypt to learn
user identity. If they were user-parsable—if, for example, a cookie had the value THIS-USER-IS-brandon—then a
clever user could edit the cookie to produce a forged value and submit it with their next request to impersonate some
other user whose username they knew or were able to guess.
Real-world Set-Cookie headers can be much more complicated than the example given, as described at length in
RFC 6265. I should mention the secure attribute. It instructs the HTTP client not to present the cookie when making
unencrypted requests to the site. Without this attribute, a cookie could be exposed, allowing anyone else sharing the
coffee-shop wi-fi with a user to learn the cookie’s value and use it to impersonate the user. Some web sites give you a
cookie simply for visiting. This lets them track your visit as you move around the site. The history collected can already
be used to target ads as you browse and then can be copied into your permanent account history if you later log in
with a username.
Many user-directed HTTP services will not operate without cookies keeping track of your identity and proving
that you have authenticated. Tracking cookies with urllib requires object orientation; please read its documentation.
Tracking cookies in Requests happens automatically if you create, and consistently use, a Session object.


Connections, Keep-Alive, and httplib


The three-way handshake that starts a TCP connection (see Chapter 3) can be avoided if a connection is already
open, which even in the early days provided the impetus for HTTP to allow connections to stay open as a browser
downloaded an HTTP resource, then its JavaScript, and then its CSS and images. With the emergence of TLS (see
Chapter 6) as a best practice for all HTTP connections, the cost of setting up a new connection is even greater,
increasing the benefit of connection reuse.
Protocol version HTTP/1.1 has made it the default for an HTTP connection to stay open after a request. Either
the client or the server can specify Connection: close if they plan on hanging up once a request is completed, but
otherwise a single TCP connection can be used repeatedly to pull as many resources from the server as the client
wants. Web browsers often create four or more simultaneous TCP connections per site so that a page and all of its
support files and images can be downloaded in parallel to try to get them in front of the user as quickly as possible.
Section 6 of RFC 7230 should be consulted to learn the complete connection control scheme, if you are an
implementer who is interested in the details.
It is unfortunate that the urllib module makes no provision for connection reuse. Making two requests on the
same socket is possible through the Standard Library only by using the lower-level httplib module.





import http.client
h = http.client.HTTPConnection('localhost:8000')
h.request('GET', '/ip')
r = h.getresponse()
r.status
200
h.request('GET', '/user-agent')
r = h.getresponse()
r.status
200




Free download pdf