Another important consideration especially in a financial services environment is to
have a fit-for-purpose system. In the financial sector, you don’t have the luxury of simply
re-booting a server or a process if it falls over in the middle of a critical news or trading
day. The systems have to stay up all the time and deliver their results consistently and in
a timely manner. Lexalytics worked closely with Thomson Reuters in the design and
development of TRNA to ensure that the system architecture was fault-tolerant and
fully resilient in order to measure sentiment and other valuable text characteristics to
produce signals in a matter of milliseconds with no downtime.
What is company-aliasing and why should I care?
Aliasing, which some people call the ‘‘Also Known As’’ problem, is simply the task of
making sure that all of the different names for a company are recognized as references to
that company. For example, everyone knows that IBM is also known as Big Blue, so it
shouldn’t be that difficult to build an alias file that we can use to roll up all the references
of IBM to a common name. Unfortunately, there are tens of thousands of entities that
need to be aliased, with frequent name changes due to mergers, delistings, etc. Working
with Thomson Reuters on the development of TRNA meant that we didn’t have to solve
the company-aliasing problem. It maintains one of the best alias lists in the world, and
it’s tightly integrated with their real-time newsfeed. Named entity detection, therefore,
was a non-issue in the build-out of TRNA. This created a huge advantage to us in the
build-out of TRNA because we were able to sidestep a very difficult issue and focus our
energy on the problem of finding a sentiment signal in the news content.
There are a few competitors who claim to have sophisticated sentiment technologies
that assign sentiment to the article. You’ve chosen to measure and attribute sentiment
to individual entities within the article. Does it really matter? Why is this such an
important distinction?
Sentiment can be misleading if only scored at an article level. Many stories are reporting
on two or more companies, so measurement of the sentiment for the story isn’t par-
ticularly useful unless all the companies are in the same industry and are being reported
on with the same opinion. It’s not hard to imagine that many stories don’t fit this
definition; most will compare and contrast companies within a sector, so article level
sentiment is at best uninteresting, and at worst, misleading. The key to finding and
reporting a useful signal is entity level sentiment.
There are several keys to doing entity sentiment well, but the first step is to understand
why entity level sentiment vs. document sentiment is so important. If you consider a
financial news story reporting quarterly results for Apple, and the story notes that the
iPhone is continuing to gain market share vs. competitors like HTC who are clearly
struggling, then it’s easy to see that sentiment depends entirely on who you are in that
document. If you’re HTC, things aren’t so good, but if you’re Apple, the world’s a
happy place. Applying sentiment to just the overall document would have little, or no
value. It is the entities contained within the reports that are important and entity level
sentiment becomes critical in analyzing that data.
There are a number of important technical issues to solve when measuring entity level
sentiment. Among them are the disambiguation of words and the assignment of
324 Industry insights, technology, products, and service providers