Functional Python Programming

(Wang) #1

More Itertools Techniques


More practically, we have a dataset with a number of variables. A common exploratory
technique is to determine the correlation among all pairs of variables in a set of data.
If there are v variables, then we will enumerate all variables that must be compared by
executing the following command:


combinations(range(v), 2)


Let's get some sample data from http://www.tylervigen.com to show how this
will work. We'll pick three datasets with the same time range: numbers 7, 43, and



  1. We'll simply laminate the data into a grid, repeating the year column.


This is how the first and the remaining rows of the yearly data will look:


[('year', 'Per capita consumption of cheese (US)Pounds (USDA)',
'Number of people who died by becoming tangled in their
bedsheetsDeaths (US) (CDC)',
'year', 'Per capita consumption of mozzarella cheese (US)Pounds
(USDA)', 'Civil engineering doctorates awarded (US)Degrees awarded
(National Science Foundation)',
'year', 'US crude oil imports from VenezuelaMillions of barrels
(Dept. of Energy)', 'Per capita consumption of high fructose corn
syrup (US)Pounds (USDA)'),


(2000, 29.8, 327, 2000, 9.3, 480, 2000, 446, 62.6),
(2001, 30.1, 456, 2001, 9.7, 501, 2001, 471, 62.5),
(2002, 30.5, 509, 2002, 9.7, 540, 2002, 438, 62.8),
(2003, 30.6, 497, 2003, 9.7, 552, 2003, 436, 60.9),
(2004, 31.3, 596, 2004, 9.9, 547, 2004, 473, 59.8),
(2005, 31.7, 573, 2005, 10.2, 622, 2005, 449, 59.1),
(2006, 32.6, 661, 2006, 10.5, 655, 2006, 416, 58.2),
(2007, 33.1, 741, 2007, 11, 701, 2007, 420, 56.1),
(2008, 32.7, 809, 2008, 10.6, 712, 2008, 381, 53),
(2009, 32.8, 717, 2009, 10.6, 708, 2009, 352, 50.1)]


This is how we can use the combinations() function to emit all the combinations
of the nine variables in this dataset, taken two at a time:


combinations(range(9), 2)


There are 36 possible combinations. We'll have to reject the combinations that
involve year and year. These will trivially correlate with a value of 1.00.


Here is a function that picks a column of data out of our dataset:


def column(source, x):


for row in source:


yield row[x]

Free download pdf