Chapter 8
Combining iterators with chain()
We can use the chain() function to combine a collection of iterators into a single,
overall iterator. This can be helpful to combine data that was decomposed via the
groupby() function. We can use this to process a number of collections as if they
were a single collection.
In particular, we can combine the chain() function with the contextlib.
ExitStack() method to process a collection of files as a single iterable sequence
of values. We can do something like this:
from contextlib import ExitStack
import csv
def row_iter_csv_tab(*filenames):
with ExitStack() as stack:
files = [stack.enter_context(open(name, 'r', newline=''))
for name in filenames]
readers = [csv.reader(f, delimiter='\t') for f in files]
readers = map(lambda f: csv.reader(f, delimiter='\t'), files)
yield from chain(*readers)
We've created an ExitStack object that can contain a number of individual contexts
open. When the with statement finishes, all items in the ExitStack object will be
closed properly. We created a simple sequence of open file objects; these objects were
also entered into the ExitStack object.
Given the sequence of files in the files variable, we created a sequence of
CSV readers in the readers variable. In this case, all of our files have a common
tab-delimited format, which makes it very pleasant to open all of the files with a
simple, consistent application of a function to the sequence of files.
We could also open the files using the following command:
readers = map(lambda f: csv.reader(f, delimiter='\t'), files)
Finally, we chained all of the readers into a single iterator with chain(*readers).
This was used to yield the sequence of rows from all of the files.
It's important to note that we can't return the chain(*readers) object. If we do, this
would exit the with statement context, closing all the source files. Instead, we must
yield individual rows so that the with statement context is kept active.