optimistic, then the within-class variation is zero, and the across-class variation is
positive. In such a case, where the denominator of the equation above is zero, the word
‘‘dismal’’ is an infinitely powerful discriminant. It should be given a very large weight in
any word count algorithm.
In Das and Chen (2007) we looked at stock message board text and determined good
discriminants using the Fisher Metric. Here are some words that showed high discri-
minant values (with values alongside) in classifying optimistic vs. pessimistic opinions
bad 0.0405
hot 0.0161
hype 0.0089
improve 0.0123
joke 0.0268
jump 0.0106
killed 0.0160
lead 0.0037
like 0.0037
long 0.0162
lose 0.1211
money 0.1537
overvalue 0.0160
own 0.0031
good_ _n 0.0485
The last word in the list (‘‘not good’’) is an example of a negated word showing a higher
discriminant value than the word itself without a negative connotation (recall the
discussion of negative tagging earlier in Section 2.3.2). Also see that the word ‘‘bad’’
has a score of 0.0405, whereas the term ‘‘not good’’ has a higher score of 0.0485. This is
an example where the structure and usage of language, not just the meaning of a word,
matters.
In another example, using theBowAlgorithm this time, examining a database of
conference calls with analysts, the best 20 discriminant words were
0.030828516377649325 allowing
0.094412331406551059 november
0.044315992292870907 determined
0.225433526011560692 general
0.034682080924855488 seasonality
0.123314065510597301 expanded
0.017341040462427744 rely
0.071290944123314062 counsel
0.044315992292870907 told
0.015414258188824663 easier
0.050096339113680152 drop
0.028901734104046242 synergies
0.025048169556840076 piece
0.021194605009633910 expenditure
0.017341040462427744 requirement
56 Quantifying news: Alternative metrics