Chapter 7
We're going to create a flat namedtuple with multiple peer attributes. This kind
of expansion is often easier to work with than deeply nested structures. In some
applications, we might have a number of transformations. For this application, we
have only two transformations: x-ranking and y-ranking. We'll break this into two
steps. First, we'll look at a simplistic wrapping like the one shown previously and
then a more general unwrap-rewrap.
The following is how the x-y ranking builds on the y-ranking:
def rank_xy(pairs):
return (Ranked_XY(r_x=r_x, r_y=rank_y_raw[0],
raw=rank_y_raw[1])
for r_x, rank_y_raw in rank(rank_y(pairs),
lambda r: r.raw.x))
We've used the rank_y() function to build Rank_Y objects. Then, we applied the
rank() function to those objects to order them by the original x values. The result
of the second rank function will be two tuples with (0) the x rank and (1) the
Rank_Y object. We build a Ranked_XY object from the x ranking (r_x), the y ranking
(rank_y_raw[0]), and the original object (rank_y_raw[1]).
What we've shown in this second function is a more general approach to adding
data to a tuple. The construction of the Ranked_XY object shows how to unwrap
the values from a data and rewrap to create a second, more complete structure.
This approach can be used generally to introduce new variables to a tuple.
The following is some sample data:
data = (Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...,
Pair(x=5.0, y=5.68))
This allows us to create ranking objects as follows:
list(rank_xy(data))
[Ranked_XY(r_x=1.0, r_y=1.0, raw=Pair(x=4.0, y=4.26)),
Ranked_XY(r_x=2.0, r_y=3.0, raw=Pair(x=5.0, y=5.68)), ...,
Ranked_XY(r_x=11.0, r_y=10.0, raw=Pair(x=14.0, y=9.96))]
Once we have this data with the appropriate x and y rankings, we can compute the
Spearman rank-order correlation value. We can compute the Pearson correlation
from the raw data.
Our multiranking approach involves decomposing a tuple and building a new, flat
tuple with the additional attributes we need. We will often need this kind of design
when computing multiple derived values from source data.