Chapter 16
Here's a sequence of statements to create the contingency table shown previously:
print("obs exp"*len(type_totals))
for s in sorted(shift_totals):
pairs= ["{0:3d} {1:5.2f}".format(defects[s,t],
float(expected[s,t])) for t in sorted(type_totals)]
print("{0} {1:3d}".format( "".join(pairs), shift_totals[s]))
footers= ["{0:3d}".format(type_totals[t]) for t in
sorted(type_totals)]
print("{0} {1:3d}".format("".join(footers), total))
This spreads the defect types across each line. We've written enough obs exp
column titles to cover all defect types. For each shift, we'll emit a line of observed and
actual pairs, followed by a shift total. At the bottom, we'll emit a line of footers with
just the defect type totals and the grand total.
A contingency table like this one helps us to visualize the comparison between
observed and expected values. We can compute a chi-squared value for these two
sets of values. This will help us decide if the data is random or if there's something
that deserves further investigation.
Computing the chi-squared value
The X^2 value is based on
()i i^2
i i
e o
e
−
∑ , where the e values are the expected values and
the o values are the observed values.
We can compute the specified formula's value as follows:
diff= lambda e,o: (e-o)**2/e
chi2= sum(diff(expected[s,t], defects[s,t]) for s in shift_totals:
for t in type_totals
)
We've defined a small lambda to help us optimize the calculation. This allows us to
execute the expected[s,t] and defects[s,t] attributes just once, even though the
expected value is used in two places. For this dataset, the final X^2 value is 19.18.
There are a total of six degrees of freedom based on three shifts and four defect
types. Since we're considering them independent, we get 2×3=6. A chi-squared table
shows us that anything below 12.5916 would reflect 1 chance in 20 of the data being
truly random. Since our value is 19.18, the data is unlikely to be random.