Luckily, the Python queue module, described in this section, makes this simple:
realistic threaded programs are usually structured as one or more producer (a.k.a.
worker) threads that add data to a queue, along with one or more consumer threads
that take the data off the queue and process it. In a typical threaded GUI, for
example, producers may download or compute data and place it on the queue; the
consumer—the main GUI thread—checks the queue for data periodically with a
timer event and displays it in the GUI when it arrives. Because the shared queue is
thread-safe, programs structured this way automatically synchronize much cross-
thread data communication.
The global interpreter lock (GIL)
Finally, as we’ll learn in more detail later in this section, Python’s implementation
of threads means that only one thread is ever really running its Python language
code in the Python virtual machine at any point in time. Python threads are true
operating system threads, but all threads must acquire a single shared lock when
they are ready to run, and each thread may be swapped out after running for a short
period of time (currently, after a set number of virtual machine instructions, though
this implementation may change in Python 3.2).
Because of this structure, the Python language parts of Python threads cannot today
be distributed across multiple CPUs on a multi-CPU computer. To leverage more
than one CPU, you’ll simply need to use process forking, not threads (the amount
and complexity of code required for both are roughly the same). Moreover, the
parts of a thread that perform long-running tasks implemented as C extensions can
run truly independently if they release the GIL to allow the Python code of other
threads to run while their task is in progress. Python code, however, cannot truly
overlap in time.
The advantage of Python’s implementation of threads is performance—when it
was attempted, making the virtual machine truly thread-safe reportedly slowed all
programs by a factor of two on Windows and by an even larger factor on Linux.
Even nonthreaded programs ran at half speed.
Even though the GIL’s multiplexing of Python language code makes Python
threads less useful for leveraging capacity on multiple CPU machines, threads are
still useful as programming tools to implement nonblocking operations, especially
in GUIs. Moreover, the newer multiprocessing module we’ll meet later offers an-
other solution here, too—by providing a portable thread-like API that is imple-
mented with processes, programs can both leverage the simplicity and
programmability of threads and benefit from the scalability of independent pro-
cesses across CPUs.
Despite what you may think after reading the preceding overview, threads are remark-
ably easy to use in Python. In fact, when a program is started it is already running a
thread, usually called the “main thread” of the process. To start new, independent
threads of execution within a process, Python code uses either the low-level _thread
module to run a function call in a spawned thread, or the higher-level threading module
188 | Chapter 5: Parallel System Tools