MIME-encoded, text of any encoding, and even arbitrary combinations of these. The
current email package’s requirement to decode this to str for parsing is utterly incom-
patible, though the cgi module’s own code seems suspect for some cases as well.
If you want to see for yourself how data is actually uploaded by browsers, see and run
the HTML and Python files named test-cgiu-uploads-bug* in the examples package to
upload text, binary, and mixed type files:
- test-cgi-uploads-bug.html/py attempts to parse normally, which works for some
text files but always fails for binary files with a Unicode decoding error - test-cgi-uploads-bug0.html/py tries binary mode for the input stream, but always
fails with type errors for both text and binary because of email’s str requirement - test-cgi-uploads-bug1.html/py saves the input stream for a single file
- test-cgi-uploads-bug.html/py saves the input stream for multiple files
The last two of these scripts simply read the data in binary mode and save it in binary
mode to a file for inspection, and display two headers passed in environment variables
which are used for parsing (a “multipart/form-data” content type and boundary, along
with a content length). Trying to parse the saved input data with the cgi module fails
unless the data is entirely text that is compatible with that module’s encoding assump-
tions. Really, because the data can mix text and raw binary arbitrarily, a correct parser
will need to read it as bytes and switch between text and binary processing freely.
It seems likely that this will be improved in the future, but perhaps not until Python
3.3 or later. Nearly two years after 3.0’s release, though, this book project has found
itself playing the role of beta tester more often than it probably should. This primarily
derives from the fact that implications of the Python 3.X str/bytes dichotomy were not
fully resolved in Python’s own libraries prior to release. This isn’t meant to disparage
people who have contributed much time and effort to 3.X already, of course. As some-
one who remembers 0.X, though, this situation seems less than ideal.
Writing a replacement for the cgi module and the email package code it uses—the only
true viable workaround—is not practical given this book project’s constraints. For now,
the CGI scripts that perform file uploads in this book will only work with text files, and
then only with text files of compatible encodings. This extends to email attachments
uploaded to the PyMailCGI webmail case study of the next chapter—yet another reason
why that example was not expanded with new functionality in this edition as much as
the preceding chapter’s PyMailGUI. Being unable to attach images to emails this way
is a severe functional limitation, which limits scope in general.
For updates on the probable fix for this issue in the future, watch this book’s website
(described in the Preface). A fix seems likely to be incompatible with current library
module APIs, but short of writing every new system from scratch, such is reality in the
real world of software development. (And no, “running away more” is not an option...)
1226 | Chapter 15: Server-Side Scripting