THE INTEGRATION OF BANKING AND TELECOMMUNICATIONS: THE NEED FOR REGULATORY REFORM

(Jeff_L) #1
374 JOURNAL OF LAW AND POLICY

corpora will not perform well on messy, unedited texts found in
forensic author identification.


C. Empirically Established Protocol

Stylometric computing methods that work on literary texts or
large collections of electronic text (as in electronic librarianship)
are still untested on forensically feasible data. Bringing these
methods wholesale into the forensic author identification
problem is not the same as empirically establishing a protocol
using these methods on forensically feasible data. The
stylometric computing methods must be tested on forensically
feasible ground-truth data for us to know how well they really
work.
Further, it is essential to make sure that the stylistic features
that are being used in different components of the techniques
and then subjected to the statistical multiplication rule are truly
independent features. The independence of linguistic features can
really only be determined by a linguistic theory, not by school
grammar or literary criticism. The counting of words alone and
the counting of the same words in n-grams are not independent
counts. However, since stylometric features are so
unsophisticated linguistically, these kinds of dependencies are
both common and not taken into consideration in the statistical
manipulations.
Finally, the number of texts required for a technique, the
number of component statistical tests (with truly independent
features in them, if the multiplication rule is applied), and the
ability to reach a high level of accuracy on forensically feasible
ground-truth data all must be established empirically before a
forensic author identification method based in stylometric
computing is both legally and scientifically acceptable. Fancy
statistics and vague references to “research has shown” when
the statistics are ill-applied and the references refer to
nonforensic research could very well overwhelm a judge or jury
with the aura of expertise, but it may also be seen as smoke and
mirrors and not a reliable method when the smoke clears.

Free download pdf