The Wiley Finance Series : Handbook of News Analytics in Finance

(Chris Devlin) #1

publication of a news article about a particular topic is indicated by a one and the
absence of the event by a zero. Alternatively, we can try to quantify other aspects of
news over time. For example, we could measure news flow (volume of news) or we could
determine scores (measures) based on the language sentiment of text or determine scores
(measures) based on the market’s response to particular language.
It is important to have access to historical data for effective model development and
backtesting. Commercial news data vendors normally provide large historical archives
for this purpose. The details of historic news data for global equities provided by
RavenPack and Thomson Reuters NewsScope are summarized in Section 1.A (the
appendix on p. 25). In the appendix we have summarized some essential information
taken from the RavenPack News Analytics—Dow Jones Edition (RavenPack, 2010)
and Thomson Reuters NewsScope Sentiment Engine (Thomson Reuters, 2009).


1.2.2 Pre-analysis of news data


Collecting, cleaning and analysing news data is challenging. Major news providers
collect and translate headlines and text from a wide range of worldwide sources. For
example, the Factiva database provided by Dow Jones holds data from 400 sources
ranging from electronic newswires, newspapers and magazines.
We note there are differences in the volume of news data available for different
companies. Larger companies (with more liquid stock) tend to have higher news
coverage/news flow. Moniz, Brar, and Davis (2009) observe that the top quintile
accounts for 40% of all news articles and the bottom quintile for only 5%. Cahan,
Jussa, and Luo (2009) also find news coverage is higher for larger cap companies (see
Figure 1.2).
Classification of news items is important.Major newswire providers tag incoming news
stories. A reporter entering a story on to the news systems will often manually tag it with


6 The Handbook of News Analytics in Finance


Figure 1.2.Number of news items vs. log market capitalization (taken from Cahan, Jussa, and
Luo, 2009).

Free download pdf