Foundations of Python Network Programming

(WallPaper) #1

Chapter 12 ■ Building and parsing e-Mail


234


try:
body = message.get_body(preferencelist=('plain', 'html'))
except KeyError:
print('')
else:
print(body.get_content())


for part in message.walk():
cd = part['Content-Disposition']
is_attachment = cd and cd.split(';')[0].lower() == 'attachment'
if not is_attachment:
continue
content = part.get_content()
print('* {} attachment named {!r}: {} object of length {}'.format(
part.get_content_type(), part.get_filename(),
type(content).name, len(content)))


if name == 'main':
parser = argparse.ArgumentParser(description='Parse and print an email')
parser.add_argument('filename', nargs='?', help='File containing an email')
args = parser.parse_args()
if args.filename is None:
main(sys.stdin.buffer)
else:
with open(args.filename, 'rb') as f:
main(f)


The script falls quite naturally into two parts once its command-line arguments have been parsed and the
message itself has been read and turned into an EmailMessage. Because you want the email module to have access
to the message’s exact binary representation on disk, you either open its file in binary mode 'rb' or use the binary
buffer attribute of Python’s standard input object, which will return raw bytes.
The first crucial step is the call to the get_body() method, which sends Python on a search deeper and deeper
into the message’s MIME structure looking for the part best qualified to serve as the body. The preferencelist that
you specify should be ordered with the formats that you prefer preceding the formats that you are less likely to want to
display. Here HTML content is preferred over a plain-text version of the body, but either can be accepted. If a suitable
body cannot be found, then KeyError is raised.
Note that the default preferencelist, used if you fail to specify one of your own, has three elements because it
puts multipart/related as its first preference ahead of both HTML and plain text. This default is suitable if you are
writing a sophisticated e-mail client—perhaps a webmail service or an application with a built-in WebKit pane—that
can not only format HTML correctly but can also display inline images and supports style sheets. The object you get
back will be the related-content MIME part itself, and you will then have to look inside it to find both the HTML and
all of the multimedia that it needs. Because the small script here is simply printing the resulting body to the standard
output, however, I have skipped this possibility.
Having displayed the best body that can be found, it is then time to search for any attachments the user might
want displayed or saved. Note that the example script asks for all of the essential information that MIME specifies for
an attachment: its content type, file name, and then the data itself. In a real application, you would probably open a
file for writing and save these data instead of just printing its length and type to the screen.
Note that because of a bug in Python 3.4, this display script is forced to make its own decision about which
message parts are attachments and which are not. In a future version of Python, you will be able to replace this
manual iteration of the tree and test every single part’s content disposition with a simple call to the
iter_attachments() method of your message instead.
The script that follows will work on any of the MIME messages generated by the earlier scripts, no matter how
complicated. Given the simplest message, it simply displays the “interesting” headers and body.

Free download pdf