Chapter 12
Using apply() to make a single request
In addition to the map() function's variants, a pool also has an apply(function,
*args, **kw) method that we can use to pass one value to the worker pool. We
can see that the map() method is really just a for loop wrapped around the apply()
method, we can, for example, use the following command:
list(workers.apply(analysis, f) for f in glob.glob(pattern))
It's not clear, for our purposes, that this is a significant improvement. Almost
everything we need to do can be expressed as a map() function.
Using map_async(), starmap_async(), and apply_async()
The behavior of the map(), starmap(), and apply() functions is to allocate work to
a subprocess in the Pool object and then collect the response from the subprocess
when that response is ready. This can cause the child to wait for the parent to gather
the results. The _async() function's variations do not wait for the child to finish.
These functions return an object that can be queried to get the individual results from
the child processes.
The following is a variation using the map_async() method:
import multiprocessing
pattern = "*.gz"
combined= Counter()
with multiprocessing.Pool() as workers:
results = workers.map_async(analysis, glob.glob(pattern))
data= results.get()
for c in data:
combined.update(c)
We've created a Counter() function that we'll use to consolidate the results from
each worker in the pool. We created a pool of subprocesses based on the number of
available CPUs and used this Pool object as a context manager. We then mapped
our analysis() function to each file in our file-matching pattern. The response from
the map_async() function is a MapResult object; we can query this for results and
overall status of the pool of workers. In this case, we used the get() method to get
the sequence of the Counter objects.