Functional Python Programming

(Wang) #1

Optimizations and Improvements


The output will show all of the 12 combinations of shift and defect type.


In the next section, we'll focus on reading the raw data to create summaries. This
is the kind of context in which Python is particularly powerful: working with raw
source data.


We need to observe and compare shift and defect counts with an overall expectation.
If the difference between observed counts and expected counts can be attributed to
random fluctuation, we have to accept the null hypothesis that nothing interesting
is going wrong. If, on the other hand, the numbers don't fit with random variation,
then we have a problem that requires further investigation.


Filtering and reducing the raw data with a Counter object


We'll represent the essential defect counts as a collections.Counter parameter.
We will build counts of defects by shift and defect type from the detailed raw data.
Here's a function to read some raw data from a CSV file:


import csv


from collections import Counter


from types import SimpleNamespace


def defect_reduce(input):


rdr= csv.DictReader(input)


assert sorted(rdr.fieldnames) == ["defect_type", "serial_number",
"shift"]


rows_ns = (SimpleNamespace(**row) for row in rdr)


defects = ((row.shift, row.defect_type) for row in rows_ns:


if row.defect_type)


tally= Counter(defects)


return tally


The preceding function will create a dictionary reader based on an open file
provided via the input parameter. We've confirmed that the column names match
the three expected column names. In some cases, we'll have extra columns in the file;
in this case, the assertion will be something like all((c in rdr.fieldnames) for
c in [...]). Given a tuple of column names, this will assure that all of the required
columns are present in the source. We can also use sets to assure that set(rdr.
fieldnames) <= set([...]).

Free download pdf