THE INTEGRATION OF BANKING AND TELECOMMUNICATIONS: THE NEED FOR REGULATORY REFORM

(Jeff_L) #1
BEST PRACTICES 357


  1. Run the SynAID module: Markedness Subcategorizer.

  2. Run the SynAID module: Punctuation Syntactic Edges
    Categorizer.

  3. Manually check punctuation syntactic edges for accuracy.

  4. Run SynAID’s calculation of syntactic and punctuation
    quantification and average word length, for each text,
    normalizing so that texts of different sizes can be compared, and
    output the ALIAS Quantification vector for each text.

  5. Input ALIAS Quantification output into SPSS (or DTReg
    or Weka or R)

  6. If there are a large number (50+) of K documents or
    multiple Q documents, run K–means clustering for internal
    consistency testing. If K–means clustering of K documents
    shows maximal subsetting, split K if needed. If K–means
    clustering of Q documents shows minimum subsetting, group Q.

  7. Run Linear Discriminant Function Analysis on pairwise
    K authors, with Q held out, using leave-one-out cross-validation
    and equal prior probability (not set to number of documents);
    use SPSS default options.

  8. Check classification table.
    If the DFA returns high accuracy for differentiating K1
    and K2,
    then report classification of Q and determinative features.
    If the DFA returns low accuracy for differentiating K1
    and K2,
    then stop. Do not use low accuracy model for classifying
    Q.
    High accuracy is no lower than around 80% and is
    usually in the 90s.
    Average accuracy declines for multiple authors (3 or 4)
    than for author pairs.

  9. Check documents for nonnative English or dialectal
    patterns and report.


E. Admissibility

Methods in the forensic computational approach to author
identification have been admitted as testimony in three trials,

Free download pdf