Functional Python Programming

(Wang) #1

Additional Tuple Techniques


Assigning statistical ranks


We'll break the rank ordering problem into two parts. First, we'll look at a generic,
higher-order function that we can use to assign ranks to either the x or y value of
a Pair object. Then, we'll use this to create a wrapper around the Pair object that
includes both x and y rankings. This will avoid a deeply nested structure.


The following is a function that will create a rank order for each observation
in a dataset:


from collections import defaultdict


def rank(data, key=lambda obj:obj):


def rank_output(duplicates, key_iter, base=0):


for k in key_iter:


dups= len(duplicates[k])


for value in duplicates[k]:


yield (base+1+base+dups)/2, value


base += dups


def build_duplicates(duplicates, data_iter, key):


for item in data_iter:


duplicates[key(item)].append(item)


return duplicates


duplicates= build_duplicates(defaultdict(list), iter(data), key)


return rank_output(duplicates, iter(sorted(duplicates)), 0)


Our function to create the rank ordering relies on creating an object that is like
Counter to discover duplicate values. We can't use a simple Counter function, as
it uses the entire object to create a collection. We only want to use a key function
applied to each object. This allows us to pick either the x or y value of a Pair object.


The duplicates collection in this example is a stateful object. We could have written
a properly recursive function. We'd then have to do tail-call optimization to allow
working with large collections of data. We've shown the optimized version of that
recursion here.


As a hint to how this recursion would look, we've provided the arguments to build_
duplicates() that expose the state as argument values. Clearly, the base case for
the recursion is when data_iter is empty. When data_iter is not empty, a new
collection is built from the old collection and the head next(data_iter). A recursive
evaluation of build_duplicates() will handle all items in the tail of data_iter.

Free download pdf