Optimizations and Improvements
We can't use the default initial value of 0 for the sum() function.
We must provide an empty Counter() function as an initial
value.
The type totals are created with an expression similar to the one used to create
shift totals:
type_totals= sum((Counter({d:defects[s,d]}) for s,d in defects),
Counter())
We created a dozen Counter objects using the defect type, d, as the key instead
of shift type; otherwise, the processing is identical.
The shift totals look like this:
Counter({'3': 119, '2': 96, '1': 94})
The defect type totals look like this:
Counter({'C': 128, 'A': 74, 'B': 69, 'D': 38})
We've kept the summaries as Counter objects, rather than creating simple dict
objects or possibly even list instances. We'll generally use them as simple dicts
from this point forward. However, there are some situations where we will want
proper Counter objects instead of reductions.
Alternative summary approaches
We've read the data and computed summaries in two separate steps. In some cases,
we may want to create the summaries while reading the initial data. This is an
optimization that might save a little bit of processing time. We could write a more
complex input reduction that emitted the grand total, the shift totals, and the defect
type totals. These Counter objects would be built one item at a time.
We've focused on using the Counter instances, because they seem to allow us
flexibility. Any changes to the data acquisition will still create Counter instances
and won't change the subsequent analysis.
Here's how we can compute the probabilities of defect by shift and by defect type:
from fractions import Fraction
P_shift = dict( (shift, Fraction(shift_totals[shift],total))
for shift in sorted(shift_totals))
P_type = dict((type, Fraction(type_totals[type],total)) for type in
sorted(type_totals))