Functional Python Programming

(Wang) #1
Chapter 3

Sadly, we can't trivially process this with the csv module. We have to do a little bit
of parsing to extract the useful information from this file. Since the data is properly
tab-delimited, we can use the csv.reader() function to iterate through the various
rows. We can define a data iterator as follows:


import csv
def row_iter(source):
return csv.reader(source, delimiter="\t")


We simply wrapped a file in a csv.reader function to create an iterator over rows.
We can use this iterator in the following context:


with open("Anscombe.txt") as source:
print( list(row_iter(source)) )


The problem with this is that the first three items in the resulting iterable aren't data.
The Anacombe's quartet file looks as follows when opened:


[["Anscombe's quartet"], ['I', 'II', 'III', 'IV'],
['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'],


We need to filter these rows from the iterable. Here is a function that will neatly
excise three expected title rows, and return an iterator over the remaining rows:


def head_split_fixed(row_iter):
title= next(row_iter)
assert len(title) == 1 and title[0] == "Anscombe's quartet"
heading= next(row_iter)
assert len(heading) == 4 and heading == ['I', 'II', 'III', 'IV']
columns= next(row_iter)
assert len(columns) == 8 and columns == ['x', 'y', 'x', 'y', 'x',
'y', 'x', 'y']
return row_iter


This function plucks three rows from the iterable. It asserts that each row has
an expected value. If the file doesn't meet these basic expectations, it's a symptom
that the file was damaged or perhaps our analysis is focused on the wrong file.


Since both the row_iter() and the head_split_fixed() functions expect an
iterable as an argument value, they can be trivially combined as follows:


with open("Anscombe.txt") as source:
print( list(head_split_fixed(row_iter(source))))


We've simply applied one iterator to the results of another iterator. In effect, this
defines a composite function. We're not done, of course; we still need to convert the
strings values to the float values and we also need to pick apart the four parallel
series of data in each row.

Free download pdf