Functional Python Programming

Chapter 3

Sadly, we can't trivially process this with the csv module. We have to do a little bit
of parsing to extract the useful information from this file. Since the data is properly
tab-delimited, we can use the csv.reader() function to iterate through the various
rows. We can define a data iterator as follows:

import csv
def row_iter(source):
return csv.reader(source, delimiter="\t")

We simply wrapped a file in a csv.reader function to create an iterator over rows.
We can use this iterator in the following context:

with open("Anscombe.txt") as source:
print( list(row_iter(source)) )

The problem with this is that the first three items in the resulting iterable aren't data.
The Anacombe's quartet file looks as follows when opened:

[["Anscombe's quartet"], ['I', 'II', 'III', 'IV'],
['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'],

We need to filter these rows from the iterable. Here is a function that will neatly
excise three expected title rows, and return an iterator over the remaining rows:

def head_split_fixed(row_iter):
title= next(row_iter)
assert len(title) == 1 and title[0] == "Anscombe's quartet"
heading= next(row_iter)
assert len(heading) == 4 and heading == ['I', 'II', 'III', 'IV']
columns= next(row_iter)
assert len(columns) == 8 and columns == ['x', 'y', 'x', 'y', 'x',
'y', 'x', 'y']
return row_iter

This function plucks three rows from the iterable. It asserts that each row has
an expected value. If the file doesn't meet these basic expectations, it's a symptom
that the file was damaged or perhaps our analysis is focused on the wrong file.

Since both the row_iter() and the head_split_fixed() functions expect an
iterable as an argument value, they can be trivially combined as follows:

with open("Anscombe.txt") as source:
print( list(head_split_fixed(row_iter(source))))

We've simply applied one iterator to the results of another iterator. In effect, this
defines a composite function. We're not done, of course; we still need to convert the
strings values to the float values and we also need to pick apart the four parallel
series of data in each row.

Functional Python Programming

Get our desktop app

Company

Features

Documentation

Resources