The Wiley Finance Series : Handbook of News Analytics in Finance

(Chris Devlin) #1

8.2.5 Informational content of news


Having tackled collecting and cleaning the news flow data, the biggest challenge is how
to decide on whether news is indeed newsworthy and whether the news article reflects
good or bad news. There are three main approaches used in the academic literature


.Computational linguistics A relatively new approach in the financial academic
literature is to use machine-learning techniques for automated text classification
developed through computational linguists. Natural language-processing techniques
are used in a variety of fields, ranging from insurance companies seeking to detect
fraudulent claims to journalists analysing the sentiment of political speeches. One of
the difficulties of applying such techniques to finance, however, is to take into account
the forward-looking nature of markets. It is the expectation of news and the extent
that stock prices already reflect this expectation that matters. We not only need to
decipher the informational content of news, but decide whether this is ‘‘new’’ news or
public information that has already been impounded into stock prices. The approach
by Tetlock, Tsechansky, and Macskassy (2008), for example, uses the Harvard-IV-4
psychosocial dictionary which classifies words as either positive or negative. The
authors then count the proportion of negative words in each story.
In contrast to recent academic research which focuses on algorithms based on
generic English dictionaries, certain data vendors have been specifically designed to
match financial news. Several data vendors use Bayesian Classifiers to map key words,
phrases, combinations, and other word-level definitions to pre-defined sentiment
values.
However, one of the challenges of such an approach is to identify the context in


216 News and abnormal returns


Figure 8.2.Degree of news overlap (source: RavenPack, Factiva, Factset, Macquarie Quant
Research).

Free download pdf