Learning Python Network Programming

(Sean Pound) #1
Chapter 2

It work as we'd expect it to. Note the difference between the base URL having and
not having a trailing slash.


Lastly, what if the 'relative' URL is actually an absolute URL:





urljoin('http://www.debian.org/about', 'http://www.python.org')





'http://www.python.org'


The relative URL completely replaces the base URL. This is handy, as it means that
we don't need to worry about testing whether a URL is relative or not before using it
with urljoin.


Query strings


RFC 3986 defines another property of URLs. They can contain additional parameters
in the form of key/value pairs that appear after the path. They are separated from
the path by a question mark, as shown here:


http://docs.python.org/3/search.html?q=urlparse&area=default


This string of parameters is called a query string. Multiple parameters are separated
by ampersands (&). Let's see how urlparse handles it:





urlparse('http://docs.python.org/3/search.html?
q=urlparse&area=default')





ParseResult(scheme='http', netloc='docs.python.org',
path='/3/search.html', params='', query='q=urlparse&area=default',
fragment='')


So, urlparse recognizes the query string as the query component.


Query strings are used for supplying parameters to the resource that we
wish to retrieve, and this usually customizes the resource in some way. In the
aforementioned example, our query string tells the Python docs search page that
we want to run a search for the term urlparse.


The urllib.parse module has a function that helps us turn the query component
returned by urlparse into something more useful:





from urllib.parse import parse_qs








result = urlparse
('http://docs.python.org/3/search.html?q=urlparse&area=default')








parse_qs(result.query)





{'area': ['default'], 'q': ['urlparse']}

Free download pdf