The Wiley Finance Series : Handbook of News Analytics in Finance

(Chris Devlin) #1

6.3 News data structure and statistics


We are using data from the Thomson Reuters NewsScope Sentiment Engine (RNSE),
developed with Infonics (RNSE, 2008). These data have


.Global scope, to examine both US and international markets.
.Broad coverage, currently over 7,000 US stocks, more than adequate for our test
sample of the contemporaneous S&P 1500 stocks over the period.
.Rich metadata—sentiment, relevance to a stock, topic codes, and links to previous
related stories.
.Up-to-date 6-year history (2003–2009).^1
.Real-time availability.


Thomson Reuters also furnished accurate synchronized pricing data, with Reuters
Instrument Code (RIC) security identifiers matching the news and price data. We are
very appreciative to the Thomson Reuters Data Team for assembling such a clean
product.


6.3.1 Sample news data


Figure 6.2 shows a sample of RNSE data.


6.3.2 Descriptive news statistics and trends


A number of noteworthy trends are in evidence in Figures 6.3–6.6, showing remarkable
growth in scope, depth, and volume of news. RIC refers to Reuters Instrument Codes, a
great resource in linking news to prices.
Figures 6.3–6.6 show only the numbers of news events. Figure 6.7 is a very intuitively
satisfying picture of the overall sentiment of the news in recent years.


6.4 Improving news analytics with aggregation


6.4.1 Event studies


We used event studies as a means of systematic screening for interesting relationships
between events defined using news analytics built using RNSE data. We were able to set,
and vary thresholds (both absolute and relative) based on


.News intensity Number of news items in a period.
.Relevance Applicability of the items to a particular stock (0–100%).
.Sentiment scores Probability that a story is positive, negative, or neutral in tone for
these items.
.Novelty and type of items Alerts, number of links to previous items, etc.


The time period for the event studies shown here is 2003–2008 with a universe of stocks
based on the contemporaneous S&P 1500 over this period. Industry classifications are
based on Thomson Reuters Business Classification (TRBC) sectors.
These studies are done on a daily timescale. The return intervals examined extend out
to 60 days. Signals on this scale have a ‘‘slower alpha’’, presumably due to the time it


Relating news analytics to stock returns 153

(^1) At the time of preparation of this chapter.

Free download pdf