prior to parsing; in worst cases other than email that may mandate mixed data types,
the current package cannot be used at all. Here’s the issue live:
>>> text # from prior example in his section
'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ver...'
>>> btext = text.encode()
>>> btext
b'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ve...'
>>> msg = Parser().parsestr(text) # email parser expects Unicode str
>>> msg = Parser().parsestr(btext) # but poplib fetches email as bytes!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\lib\email\parser.py", line 82, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
TypeError: initial_value must be str or None, not bytes
>>> msg = Parser().parsestr(btext.decode()) # okay per default
>>> msg = Parser().parsestr(btext.decode('utf8')) # ascii encoded (default)
>>> msg = Parser().parsestr(btext.decode('latin1')) # ascii is same in all 3
>>> msg = Parser().parsestr(btext.decode('ascii'))
This is less than ideal, as a bytes-based email would be able to handle message encod-
ings more directly. As mentioned, though, the email package is not really fully func-
tional in Python 3.1, because of its legacy str focus, and the sharp distinction that
Python 3.X makes between Unicode text and byte strings. In this case, its parser should
accept bytes and not expect clients to know how to decode.
Because of that, this book’s email clients take simplistic approaches to decoding fetched
message bytes to be parsed by email. Specifically, full-text decoding will try a user-
configurable encoding name, then fall back on trying common types as a heuristic, and
finally attempt to decode just message headers.
This will suffice for the examples shown but may need to be enhanced for broader
applicability. In some cases, encoding may have to be determined by other schemes
such as inspecting email headers (if present at all), guessing from bytes structure
analysis, or dynamic user feedback. Adding such enhancements in a robust fashion is
likely too complex to attempt in a book’s example code, and it is better performed in
common standard library tools in any event.
Really, robust decoding of mail text may not be possible today at all, if it requires
headers inspections—we can’t inspect a message’s encoding information headers un-
less we parse the message, but we can’t parse a message with 3.1’s email package unless
we already know the encoding. That is, scripts may need to parse in order to decode,
but they need to decode in order to parse! The byte strings of poplib and Unicode strings
of email in 3.1 are fundamentally at odds. Even within its own libraries, Python 3.X’s
changes have created a chicken-and-egg dependency problem that still exists nearly
two years after 3.0’s release.
928 | Chapter 13: Client-Side Scripting