As coded, a timer loop is run only when a grep is in progress, and each grep uses its
own thread, timer loop, and queue. There may be multiple threads and loops running,
and there may be other unrelated threads, queues, and timer loops in the process. For
instance, an attached PyEdit component in Chapter 14’s PyMailGUI program can run
grep threads and loops of its own, while PyMailGUI runs its own email-related threads
and queue checker. Each loop’s handler is dispatched independently from the tkinter
event stream processor. Because of the simpler structure here, the general thread
tools callback queue of Chapter 10 is not used here. For more notes on grep thread
implementation see the source code ahead, and compare to file _unthreaded-
textEditor.py in the examples package, a nonthreaded version of PyEdit.
If you study the Grep option’s code, you’ll notice that it also allows
input of a tree-wide Unicode encoding, and catches and skips any Unicode decoding
error exceptions generated both when processing file content and walking the tree’s
filenames. As we learned in Chapters 4 and 6, files opened in text mode in Python 3.X
must be decodable per a provided or platform default Unicode encoding. This is par-
ticular problematic for Grep, as directory trees may contain files of arbitrarily mixed
encoding types.
In fact, it’s common on Windows to have files with content in ASCII, UTF-8, and
UTF-16 form mixed in the same tree (Notepad’s “ANSI,” “Utf-8,” and “Unicode”),
and even others in trees that contain content obtained from the Web or email. Opening
all these with UTF-8 would trigger exceptions in Python 3.X, and opening all these in
binary mode yields encoded text that will likely fail to match a search key string. Tech-
nically, to compare at all, we’d still have to decode the bytes read to text or encode the
search key string to bytes, and the two would only match if the encodings used both
succeed and agree.
To allow for mixed encoding trees, the Grep dialog opens in text mode and allows an
encoding name to be input and used to decode file content for all files in the tree
searched. This encoding name is prefilled with the platform content default for con-
venience, as this will often suffice. To search trees of mixed file types, users may run
multiple Greps with different encoding names. The names of files searched might fail
to decode as well, but this is largely ignored in the current release: they are assumed to
satisfy the platform filename convention, and end the search if they don’t (see Chapters
4 and 6 for more on filename encoding issues in Python itself, as well as the find walker
reused here).
In addition, Grep must take care to catch and recover from encoding errors, since some
files with matching names that it searches might still not be decodable per the input
encoding, and in fact might not be text files at all. For example, searches in
Python 3.1’s standard library (like the example Grep for % described earlier) run into a
handful of files which do not decode properly on my Windows machine and would
otherwise crash PyEdit. Binary files which happen to match the filename patterns would
fare even worse.
Grep Unicode model.
686 | Chapter 11: Complete GUI Programs