BEST PRACTICES 357
- Run the SynAID module: Markedness Subcategorizer.
- Run the SynAID module: Punctuation Syntactic Edges
Categorizer. - Manually check punctuation syntactic edges for accuracy.
- Run SynAID’s calculation of syntactic and punctuation
quantification and average word length, for each text,
normalizing so that texts of different sizes can be compared, and
output the ALIAS Quantification vector for each text. - Input ALIAS Quantification output into SPSS (or DTReg
or Weka or R) - If there are a large number (50+) of K documents or
multiple Q documents, run K–means clustering for internal
consistency testing. If K–means clustering of K documents
shows maximal subsetting, split K if needed. If K–means
clustering of Q documents shows minimum subsetting, group Q. - Run Linear Discriminant Function Analysis on pairwise
K authors, with Q held out, using leave-one-out cross-validation
and equal prior probability (not set to number of documents);
use SPSS default options. - Check classification table.
If the DFA returns high accuracy for differentiating K1
and K2,
then report classification of Q and determinative features.
If the DFA returns low accuracy for differentiating K1
and K2,
then stop. Do not use low accuracy model for classifying
Q.
High accuracy is no lower than around 80% and is
usually in the 90s.
Average accuracy declines for multiple authors (3 or 4)
than for author pairs. - Check documents for nonnative English or dialectal
patterns and report.
E. Admissibility
Methods in the forensic computational approach to author
identification have been admitted as testimony in three trials,