THE INTEGRATION OF BANKING AND TELECOMMUNICATIONS: THE NEED FOR REGULATORY REFORM

(Jeff_L) #1
372 JOURNAL OF LAW AND POLICY

With large literary datasets and the advent of computer
science, stylometric computing offers more sophisticated,
statistical procedures for use in comparing documents than
traditional stylometry. Computer science offers, for instance,
machine-learning methods for text classification. But like
traditional stylometry, stylometric computing uses language
features that are not grounded in linguistic theory but are easy
for a computer to work with, such as character strings, words,
word frequency, and common words among texts.
Recently, several researchers such as Koppel, Argamon,
Juola, Chen, and their students have begun to use stylometric
computing for forensic author identification.^107 In light of the
best practices for forensic author identification and a recent
admissibility ruling, stylometric computing currently needs to
incorporate at least three of these best practices.


A. Ground-Truth Data

Ground-truth data are all too often overlooked or
undervalued in stylometric computing. One intriguing study of
the “writeprint” claimed a high degree of accuracy at identifying
the authorship of emails, with over ninety-seven percent
accuracy for English and over ninety-two percent accuracy for
Chinese.^108 This impressive result, however, is undermined by
the fact that the dataset was not ground-truth data, as revealed
by the researchers’ comment about a substudy of three authors
in their English dataset: “Clearly, Mike’s distinct writeprint
from the other two indicates his unique identity. The high
degree of similarity between the writeprints of Joe and Roy
suggests these two IDs might be the same person.”^109 Joe and
Roy’s “writeprints” are almost identical. Yet it is also possible


(^107) See, e.g., Shlomo Argamon & Moshe Koppel, A Systemic Functional
Approach to Automated Authorship Analysis, 21 J.L. & POL’Y 299 (2013);
Patrick Juola, Stylometry and Immigration: A Case Study, 21 J.L. & POL’Y
287 (2013); Moshe Koppel et al., Authorship Attribution: What’s Easy and
What’s Hard?, 21 J.L. & POL’Y 317 (2013).
(^108) Jiexun Li et al., From Fingerprint to Writeprint, 49 COMM. ACM 9,
9–10 (2006).
(^109) Id. at 82.

Free download pdf