Working with Collections
This correlation function gathers basic statistical summaries of the two sets of samples:
the mean and standard deviation. Given these summaries, we defined two generator
functions that will create normalized values for each set of samples. We can then use
the zip() function (see the next example) to pair up items from the two sequences
of normalized values and compute the product of those two normalized values.
The average of the product of the normalized scores is the correlation.
The following is an example of gathering the correlation between two sets of samples:
xi= [1.47, 1.50, 1.52, 1.55, 1.57, 1.60, 1.63, 1.65,
... 1.68, 1.70, 1.73, 1.75, 1.78, 1.80, 1.83,] # Height (m)
yi= [52.21, 53.12, 54.48, 55.84, 57.20, 58.57, 59.93, 61.29,
... 63.11, 64.47, 66.28, 68.10, 69.92, 72.19, 74.46,] #
... Mass (kg)
round(corr( xi, yi ), 5)
0.99458
We've shown two sequences of data points, xi and yi. The correlation is over .99,
which shows a very strong relationship between the two sequences.
This shows one of the strengths of functional programming. We've created a handy
statistical module using a half-dozen functions with definitions that are single
expressions. The counterexample is the corr() function that can be reduced to a
single very long expression. Each internal variable in this function is used just once;
a local variable can be replaced with a copy-and-paste of the expression that created
it. This shows us that the corr() function has a functional design even though it's
written out in six separate lines of Python.
Using zip() to structure and flatten sequences
The zip() function interleaves values from several iterators or sequences. It will
create n tuples from the values in each of the n input iterables or sequences. We used
it in the previous section to interleave data points from two sets of samples, creating
two tuples.
The zip() function is a generator. It does not materialize
a resulting collection.