Functions, Iterators, and Generators
We can tweak the original generator expression as follows:
g_f_x = (g(f(x)) for x in range())
While technically correct, this defeats any idea of reuse. Rather than reusing an
expression, we rewrite it.
We can also substitute one expression within another expression, as follows:
g_f_x = (g(y) for y in (f(x) for x in range()))
This has the advantage of allowing us to use simple substitution. We can revise
this slightly to emphasize reuse, using the following commands:
f_x= (f(x) for x in range())
g_f_x= (g(y) for y in f_x)
This has the advantage of leaving the initial expression, (f(x) for x in range()),
essentially untouched. All we did was assign the expression to a variable.
The resulting composite function is also a generator expression, which is also lazy.
This means that extracting the next value from g_f_x will extract one value from
f_x, which will extract one value from the source range() function.
Cleaning raw data with generator functions
One of the tasks that arise in exploratory data analysis is cleaning up raw source
data. This is often done as a composite operation applying several scalar functions
to each piece of input data to create a usable data set.
Let's look at a simplified set of data. This data is commonly used to show techniques
in exploratory data analysis. It's called Anscombe's Quartet, and it comes from
the article, Graphs in Statistical Analysis, by F. J. Anscombe that appeared in
American Statistician in 1973. Following are the first few rows of a downloaded
file with this dataset:
Anscombe's quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71