436 JOURNAL OF LAW AND POLICY
The models based on word features achieve their best
performance at about 400 features, far lower than the character
n-gram representation. However, the performance of the models
based on more features does not drop dramatically as happens in
cross-topic experiments. Again, this confirms the above
conclusion about the usefulness of thematic-related information
Figure 5: Performance of the cross-topic attribution models
(training on Politics, test on UK).
Figure 6: Performance of the cross-genre attribution models
(training on Politics, test on Book reviews).
30
40
50
60
70
80
90
100
0 5000 10000 15000
Accuracy (%)
Features
Words
Char 3-
grams
30
40
50
60
70
80
90
100
0 5000 10000 15000
Accuracy (%)
Features
Words
Char 3-
grams