Functional Python Programming

(Wang) #1
Chapter 7

In Chapter 4, Working with Collections, we showed how to compute the Pearson
correlation coefficient. The function we showed, corr(), worked with two separate
sequences of values. We can use it with our sequence of Pair objects as follows:


import ch04_ex4
def pearson_corr(pairs):
X = tuple(p.x for p in pairs)
Y = tuple(p.y for p in pairs)
return ch04_ex4.corr(X, Y)


We've unwrapped the Pair objects to get the raw values that we can use with the
existing corr() function. This provides a different correlation coefficient. The
Pearson value is based on how well the standardized values compare between two
sequences. For many data sets, the difference between the Pearson and Spearman
correlations is relatively small. For some datasets, however, the differences can be
quite large.


To see the importance of having multiple statistical tools for exploratory data
analysis, compare the Spearman and Pearson correlations for the four sets of
data in the Anscombe's Quartet.


Polymorphism and Pythonic pattern matching


Some functional programming languages offer clever approaches to working with
statically typed function definitions. The issue is that many functions we'd like
to write are entirely generic with respect to data type. For example, most of our
statistical functions are identical for integer or floating-point numbers, as long
as division returns a value that is a subclass of numbers.Real (for example, Decimal,
Fraction, or float). In order to make a single generic definition work for multiple
data types, sophisticated type or pattern-matching rules are used by the compiler.


Instead of the (possibly) complex features of statically typed functional languages,
Python changes the issue using dynamic selection of the final implementation of
an operator based on the data types being used. This means that a compiler doesn't
certify that our functions are expecting and producing the proper data types. We
generally rely on unit testing for this.


In Python, we're effectively writing generic definitions because the code isn't bound
to any specific data type. The Python runtime will locate the appropriate operations
using a simple set of matching rules. The 3.3.7 Coercion rules section of the language
reference manual and the numbers module in the library provide details on how this
mapping from operation to special method name works.

Free download pdf