Its original HTML form is also displayed in its full glory in a popped-up web browser
as before.
This is a prototype. Because PyMailGUI is oriented toward plain text today, this parser
is intended as a temporary workaround until a HTML viewer/editor widget solution is
found. Because of that, this is at best a first cut which has not been polished to any
significant extent. Robustly parsing HTML in its entirety is a task well beyond the scope
of this chapter and book. When this parser fails to render good plain text (and it will!),
users can still view and cut-and-paste the properly formatted text from the web browser.
This is also a preview. HTML parsing is not covered until Chapter 19 of this book, so
you’ll have to take this on faith until we refer back to it in that later chapter. Unfortu-
nately, this feature was added to PyMailGUI late in the book project, and avoiding this
forward reference didn’t seem to justify omitting the improvement altogether. For more
details on HTML parsing, stay tuned for (or flip head to) Chapter 19.
In short, the class here provides handler methods that receive callbacks from an HTML
parser, as tags and their content is recognized; we use this model here to save text we’re
interested in along the way. Besides the parser class, we could also use Python’s
html.entities module to map more entity types than are hardcoded here—another
tool we will meet in Chapter 19.
Despite its limitations, this example serves as a rough guide to help get you started, and
any result it produces is certainly an improvement upon the prior edition’s display and
quoting of raw HTML.
Example 14-8. PP4E\Internet\Email\PyMailGui\html2text.py
"""
################################################################
VERY preliminary html-to-text extractor, for text to be
quoted in replies and forwards, and displayed in the main
text display component. Only used when the main text part
is HTML (i.e., no alternative or other text parts to show).
We also need to know if this is HTML or not, but findMainText
already returns the main text's content type.
This is mostly provided as a first cut, to help get you started
on a more complete solution. It hasn't been polished, because
any result is better than displaying raw HTML, and it's probably
a better idea to migrate to an HTML viewer/editor widget in the
future anyhow. As is, PyMailGUI is still plain-text biased.
If (really, when) this parser fails to render well, users can
instead view and cut-and-paste from the web browser popped up
to display the HTML. See Chapter 19 for more on HTML parsing.
################################################################
"""
from html.parser import HTMLParser # Python std lib parser (sax-like model)
class Parser(HTMLParser): # subclass parser, define callback methods
def init(self): # text assumed to be str, any encoding ok
PyMailGUI Implementation| 1103