invocations—the filename can contain both a remote directory path at the front and
query parameters at the end for a remote program invocation.
Given a script invocation URL and no explicit output filename, the script extracts the
base filename in the middle by using first the standard urllib.parse module to pull out
the file path, and then os.path.split to strip off the directory path. However, the re-
sulting filename is a remote script’s name, and it may or may not be an appropriate
place to store the data locally. In the first run that follows, for example, the script’s
output goes in a local file called languages.py, the script name in the middle of the URL;
in the second, we instead name the output CxxSyntax.html explicitly to suppress file-
name extraction:
C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost
/cgi-bin/languages.py?language=Scheme
http://localhost/cgi-bin/languages.py?language=Scheme languages.py
b'<TITLE>Languages</TITLE>\n'
b'<H1>Syntax</H1><HR>\n'
b'<H3>Scheme</H3><P><PRE>\n'
b' (display "Hello World") (newline) \n'
b'</PRE></P><BR>\n'
b'<HR>\n'
C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost
/cgi-bin/languages.py?language=C++ CxxSyntax.html
http://localhost/cgi-bin/languages.py?language=C++ CxxSyntax.html
b'<TITLE>Languages</TITLE>\n'
b'<H1>Syntax</H1><HR>\n'
b'<H3>C </H3><P><PRE>\n'
b"Sorry--I don't know that language\n"
b'</PRE></P><BR>\n'
b'<HR>\n'
The remote script returns a not-found message when passed “C++” in the last com-
mand here. It turns out that “+” is a special character in URL strings (meaning a space),
and to be robust, both of the urllib scripts we’ve just written should really run the
filename string through something called urllib.parse.quote, a tool that escapes spe-
cial characters for transmission. We will talk about this in depth in Chapter 15, so
consider this a preview for now. But to make this invocation work, we need to use
special sequences in the constructed URL. Here’s how to do it by hand:
C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost
/cgi-bin/languages.py?language=C%2b%2b CxxSyntax.html
http://localhost/cgi-bin/languages.py?language=C%2b%2b CxxSyntax.html
b'<TITLE>Languages</TITLE>\n'
b'<H1>Syntax</H1><HR>\n'
b'<H3>C++</H3><P><PRE>\n'
b' cout << "Hello World" << endl; \n'
b'</PRE></P><BR>\n'
b'<HR>\n'
The odd %2b strings in this command line are not entirely magical: the escaping required
for URLs can be seen by running standard Python tools manually—this is what these
The urllib Package Revisited| 1001