Although this covers most URLs you’re likely to encounter in the wild, the full format
of URLs is slightly richer:
protocol://networklocation/path;parameters?querystring#fragment
For instance, the fragment part may name a section within a page (e.g., #part1). More-
over, each part can have formats of its own, and some are not used in all protocols.
The ;parameters part is omitted for HTTP, for instance (it gives an explicit file type for
FTP), and the networklocation part may also specify optional user login parameters for
some protocol schemes (its full format is user:password@host:port for FTP and Telnet,
but just host:port for HTTP). We used a complex FTP URL in Chapter 13, for example,
which included a username and password, as well as a binary file type (the server may
guess if no type is given):
ftp://lutz:[email protected]/filename;type=i
We’ll ignore additional URL formatting rules here. If you’re interested in more details,
you might start by reading the urllib.parse module’s entry in Python’s library manual,
as well as its source code in the Python standard library. You may also notice that a
URL you type to access a page looks a bit different after the page is fetched (spaces
become + characters, % characters are added, and so on). This is simply because brows-
ers must also generally follow URL escaping (i.e., translation) conventions, which we’ll
explore later in this chapter.
Using minimal URLs
Because browsers remember the prior page’s Internet address, URLs embedded in
HTML files can often omit the protocol and server names, as well as the file’s directory
path. If missing, the browser simply uses these components’ values from the last page’s
address. This minimal syntax works for URLs embedded in hyperlinks and for form
actions (we’ll meet forms later in this tutorial). For example, within a page that was
fetched from the directory dirpath on the server http://www.server.com, minimal hy-
perlinks and form actions such as:
<A HREF="more.html">
<FORM ACTION="next.py" ...>
are treated exactly as if we had specified a complete URL with explicit server and path
components, like the following:
<A HREF="http://www.server.com/dirpath/more.html">
<FORM ACTION="http://www.server.com/dirpath/next.py" ...>
The first minimal URL refers to the file more.html on the same server and in the same
directory from which the page containing this hyperlink was fetched; it is expanded to
a complete URL within the browser. URLs can also employ Unix-style relative path
syntax in the file path component. A hyperlink tag like , for
instance, names a GIF file on the server machine and parent directory of the file that
contains this link’s URL.
Climbing the CGI Learning Curve| 1139