Learning Python Network Programming

(Sean Pound) #1

HTTP and Working with the Web


The parse_qs() method reads the query string and then converts it into a
dictionary. See how the dictionary values are actually in the form of lists? This
is because parameters can appear more than once in a query string. Try it with a
repeated parameter:





result = urlparse
('http://docs.python.org/3/search.html?q=urlparse&q=urljoin')








parse_qs(result.query)





{'q': ['urlparse', 'urljoin']}


See how both of the values have been added to the list? It's up to the server to
decide how it interprets this. If we send this query string, then it may just pick
one of the values and use that, while ignoring the repeat. You can only try it,
and see what happens.


You can usually figure out what you need to put in a query string for a given page
by submitting a query through the web interface using your web browser, and
inspecting the URL of the results page. You should be able to spot the text of your
search and consequently deduce the corresponding key for the search text. Quite
often, many of the other parameters in the query string aren't actually needed for
getting a basic result. Try requesting the page using only the search text parameter
and see what happens. Then, add the other parameters, if it does not work
as expected.


If you submit a form to a page and the resulting page's URL doesn't have a query
string, then the page would have used a different method for sending the form data.
We'll look at this in the HTTP methods section in the following, while discussing the
POST method.


URL encoding


URLs are restricted to the ASCII characters and within this set, a number of
characters are reserved and need to be escaped in different components of a URL.
We escape them by using something called URL encoding. It is often called percent
encoding, because it uses the percent sign as an escape character. Let's URL-encode
a string:





from urllib.parse import quote








quote('A duck?')





'A%20duck%3F'


The special characters ' ' and? have been replaced by escape sequences. The
numbers in the escape sequences are the characters' ASCII codes in hexadecimal.

Free download pdf