The Wiley Finance Series : Handbook of News Analytics in Finance

(Chris Devlin) #1

sentiment-bearing terms to the correct entity. For disambiguation, consider the word
‘‘fine’’. It can be used in a variety of ways: ‘‘things are fine’’; ‘‘I just paid a fine’’; ‘‘fine
grains of sand’’. Clearly understanding the context of the word in a particular piece of
text is vital to measurement of the term’s effect in a given sentence. Additionally, it’s
important to correctly parse the grammar of a sentence so sentiment phrases are
attached to the appropriate entity. Beyond identifying nouns, pronouns, adjectives,
and verbs, analysis must also be done on the relationship between entities. For example,
to understand in a report that ‘‘John Smith, CEO’’ refers to the same person when
written ‘‘He announced his resignation’’ helps significantly when assigning entity level
sentiment.


How do you deal with the quantity vs. quality of sources?


The volume of information available today is larger than it’s ever been, but most of the
content is of unknown quality. Not only is professionally produced news content of
better technical quality (good punctuation and grammar) than most other content, it is
also more trustworthy. It’s far less likely that professional news content will ‘‘have an
axe to grind’’, so the sentiment measures we derive from the content are more accurate
and honest than those we’d obtain if we added social media sources to the mix. For other
sources that are ‘‘professional’’ but likely consistently biased, like company press
releases, there are a number of post-processing analysis techniques that you can perform
to remove the biases. One can look at the deviation from average sentiment on releases
from IBM rather than comparing IBM’s release against that from Dell. In addition, an
end-user might weight the signal from different sources according to their circulation,
page views, or number of households reached. With an unparalleled host of metadata we
produce on over 60 fields, we enable clients to slice and dice the data as they see fit for the
signal and intelligence that matters most to them.


What advice would you have for someone who wants to develop this
capability themselves?


Anyone wishing to build these sorts of capabilities on their own should not
underestimate the amount of time and resources required to develop a fit-for-purpose
system—one that is scalable to handle the large amounts of content that may need to be
processed; one that is fast enough to handle content at the speed required for algo-
rithmic trading; one that is fault-tolerant and fully resilient for 100% uptime; one that
incorporates comprehensive aliasing capabilities across tens of thousands of companies;
and one that can be managed without a huge staff to maintain it. A development team
should be prepared to set aside at least five years to build a system capable of matching
the sentiment signal we’ve achieved on financial content. Alternatively, we’d recommend
using a system like ours and focus your skills and expertise on better interpreting the
robust output to fit your trading and investment needs.


Question and answers with Lexalytics 325
Free download pdf