Functional Python Programming

(Wang) #1
Chapter 16

Here's a sequence of statements to create the contingency table shown previously:


print("obs exp"*len(type_totals))


for s in sorted(shift_totals):


pairs= ["{0:3d} {1:5.2f}".format(defects[s,t],
float(expected[s,t])) for t in sorted(type_totals)]


print("{0} {1:3d}".format( "".join(pairs), shift_totals[s]))


footers= ["{0:3d}".format(type_totals[t]) for t in
sorted(type_totals)]


print("{0} {1:3d}".format("".join(footers), total))


This spreads the defect types across each line. We've written enough obs exp
column titles to cover all defect types. For each shift, we'll emit a line of observed and
actual pairs, followed by a shift total. At the bottom, we'll emit a line of footers with
just the defect type totals and the grand total.


A contingency table like this one helps us to visualize the comparison between
observed and expected values. We can compute a chi-squared value for these two
sets of values. This will help us decide if the data is random or if there's something
that deserves further investigation.


Computing the chi-squared value


The X^2 value is based on


()i i^2
i i

e o
e


∑ , where the e values are the expected values and

the o values are the observed values.


We can compute the specified formula's value as follows:


diff= lambda e,o: (e-o)**2/e


chi2= sum(diff(expected[s,t], defects[s,t]) for s in shift_totals:


for t in type_totals


)


We've defined a small lambda to help us optimize the calculation. This allows us to
execute the expected[s,t] and defects[s,t] attributes just once, even though the
expected value is used in two places. For this dataset, the final X^2 value is 19.18.


There are a total of six degrees of freedom based on three shifts and four defect
types. Since we're considering them independent, we get 2×3=6. A chi-squared table
shows us that anything below 12.5916 would reflect 1 chance in 20 of the data being
truly random. Since our value is 19.18, the data is unlikely to be random.

Free download pdf