‘‘earnings’’ have even stronger predictive ability in forecasting earnings surprises). The
authors find that negative words in firm-specific news stories predict slightly lower
returns on the following trading day.
8.1.2 Guided tour
We begin in Sections 8.2 and 8.3 by providing an introduction to the issues and
challenges raised when cleaning and analysing news flow datasets to determine the
informational content of particular news items. Section 8.4 then considers whether there
is a hierarchy to news citations to determine which news flow matters the most for stock
returns. In Section 8.5 we combine news flow with a database of detailed analyst
revisions. We identify clusters of analyst revisions and examine whether earnings expec-
tations change following certain news flows and, if news does lead revisions, how can
investors exploit this effect? Section 8.6 combines this analysis to show how investors
can exploit news flow datasets by either trading directly on news flow or combining the
dataset with earnings momentum factors.
8.2 Aspects of news flow datasets
Here we consider the implication of the overreaction and underreaction to news,
whether there is a ‘‘hierarchy’’ to information, and consider which news items are
deemed most important. The majority of our quantitative research focuses on com-
panies’ reported balance sheets or P&L data and sell-side analysts’ estimates. It is only
recently that we have been able to go beyond this to understand the motivations behind
corporates and fundamental analysts’ decisions by looking at higher frequency news
flow datasets. Over the past few years several data vendors have started to collect and
translate headlines and text from sources worldwide, ranging from electronic newswires,
newspapers, and magazines. News items are categorized, tagged, and uploaded so that
news can be downloaded at the latest by the close of business on the day of the news
release. Many news vendors provide low-latency data feeds and analyse the sentiment of
stories within milliseconds of the news release.
We begin by considering the issues surrounding cleaning news data to ensure the
collection of both timely and relevant information, distinguishing between news types,
identifying mixed and stand-alone events, and deciphering informational content.
We highlight five key issues specific to analysing news flow.
8.2.1 Timeliness of news
The first challenge is to define an information event. How do we define what is ‘‘new’’
news from what has already been reported? We look beyond just earnings announce-
ments and consider a variety of types of news by regarding news as the release of new
information to the market. We restrict our analysis to news sources that are released to
our data vendor within a couple of hours of their publication and focus on semi-official
sources of information.
To ensure that news flow is both timely and relevant we filter our collection process to
the key newswires, stock exchange statements, press releases from company websites,
The impact of news flow on asset returns: An empirical study 213