As posed by the editors and answered by Jeff Catlin
This brief chapter poses practical questions on text analysis to Jeff Catlin, CEO of
Lexalytics, Inc., a text analytics and sentiment software company based in Boston,
MA, providing solutions primarily to the finance, enterprise search, and reputation
management industries. Among its many capabilities, Lexalytics technology powers
the Thomson Reuters News Analytics system. For more information on Lexalytics’
capabilities, please look up ‘‘Directory of new analytics service providers’’ under
Thomson Reuters (see p. 344).
So, Jeff, for those looking to analyze text, what are the biggest challenges that they
will face when analyzing this largely unstructured content?
To be honest, people that work on text processing and search understand that getting
control of the content is often a much bigger hurdle than building out an application.
Not all content is created equal. If you consider content sources like Twitter and a
Reuters newsfeed, it’s pretty hard to imagine that those will ever be handled with
identical approaches. Twitter is badly formed, with little if any capitalization, punctua-
tion, or grammar, while something like a Reuters feed is well formed, but more verbose;
so, very different approaches must be used to process these two distinctly different types
of content. As you focus down on content that is specific to financial services, the
problems change slightly.
There are some good and bad points when processing financially oriented news
content. First, financial news content tends to be well written with solid grammar
and punctuation, which helps significantly in the processing of the text to measure
sentiment. Compared with content streams like Twitter, which seldom have punctuation
and often some creative grammar, financial news content is clean and easy to work with.
The downside of processing this text is that financial news reporters try very hard to be
as impartial or muted in their writings as possible so they don’t cause undue exuberance
or panic as a result of their reporting. This can make it a bit harder to measure the
sentiment of the news. Essentially you have to turn up the gain on the engine to be as
sensitive to emotion as possible, so that muted signals are detected. We’ve done this in
our build of the Thomson Reuters News Analytics (TRNA) system, and the results in
measuring sentiment have been exceptional.
The Handbook of News Analytics in Finance Edited by L. Mitra and G. Mitra
#2011 John Wiley & Sons