Functional Python Programming

(Wang) #1

Additional Tuple Techniques


Computing the Spearman rank-order correlation


The Spearman rank-order correlation is a comparison between the rankings of two
variables. It neatly bypasses the magnitude of the values, and it can often find a
correlation even when the relationship is not linear. The formula is as follows:


t

This formula shows us that we'll be summing the differences in rank, xi and yi, for
all of the pairs of observed values. The Python version of this depends on the sum()
and len() functions, as follows:


def rank_corr(pairs):


ranked= rank_xy(pairs)


sum_d_2 = sum((r.r_x - r.r_y)**2 for r in ranked)


n = len(pairs)


return 1-6sum_d_2/(n(n**2-1))


We've created Rank_XY objects for each pair. Given this, we can then subtract the
r_x and r_y values from those pairs to compare their difference. We can then square
and sum the differences.


A good article on statistics will provide detailed guidance on what the coefficient
means. A value around 0 means that there is no correlation between the data ranks
of the two series of data points. A scatter plot shows a random scattering of points.
A value around +1 or -1 indicates a strong relationship between the two values.
A graph shows a clear line or curve.


The following is an example based on Anscombe's Quartet series I:





data = (Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...,
Pair(x=5.0, y=5.68))








round(rank_corr( data ), 3)





0.818


For this particular data set, the correlation is strong.

Free download pdf