str differentiation came online. Prior to that, the email encoder worked in Python 2.X,
because bytes was really str. In 3.X, though, because base64 returns bytes, the normal
mail encoder in email also leaves the payload as bytes, even though it’s been encoded
to Base64 text form. This in turn breaks email text generation, because it assumes the
payload is text in this case, and requires it to be str. As is common in large-scale
software systems, the effects of some 3.X changes may have been difficult to anticipate
or accommodate in full.
By contrast, parsing binary attachments (as opposed to generating text for them) works
fine in 3.X, because the parsed message payload is saved in message objects as a Base64-
encoded str string, not bytes, and is converted to bytes only when fetched. This bug
seems likely to also go away in a future Python and email package (perhaps even as a
simple patch in Python 3.2), but it’s more serious than the other Unicode decoding
issues described here, because it prevents mail composition for all but trivial mails.
The flexibility afforded by the package and the Python language allows such a work-
around to be developed external to the package, rather than hacking the package’s code
directly. With open source and forgiving APIs, you rarely are truly stuck.
Late-breaking news: This section’s bug is scheduled to be fixed in Python
3.2, making our workaround here unnecessary in this and later Python
releases. This is per communications with members of Python’s email
special interest group (on the “email-sig” mailing list).
Regrettably, this fix didn’t appear until after this chapter and its exam-
ples had been written. I’d like to remove the workaround and its de-
scription entirely, but this book is based on Python 3.1, both before and
after the fix was incorporated.
So that it works under Python 3.2 alpha, too, though, the workaround
code ahead was specialized just before publication to check for bytes
prior to decoding. Moreover, the workaround still must manually split
lines in Base64 data, because 3.2 still does not.
Workaround: Message composition for non-ASCII text parts is broken
Our final email Unicode issue is as severe as the prior one: changes like that of the prior
section introduced yet another regression for mail composition. In short, it’s impossible
to make text message parts today without specializing for different Unicode encodings.
Some types of text are automatically MIME-encoded for transmission. Unfortunately,
because of the str/bytes split, the MIME text message class in email now requires
different string object types for different Unicode encodings. The net effect is that you
now have to know how the email package will process your text data when making a
text message object, or repeat most of its logic redundantly.
For example, to properly generate Unicode encoding headers and apply required MIME
encodings, here’s how we must proceed today for common Unicode text types:
email: Parsing and Composing Mail Content | 941