The Wiley Finance Series : Handbook of News Analytics in Finance

(Chris Devlin) #1

gives the centrality score of all nodes, and highlights which nodes have the most
influence.
In Das and Sisk (2005), we computed the centrality scores for all stocks in a network
graph where the connection strengths were based on the number of common message
posters each pair of stocks had. We found that stocks such as IBM, AOL, Motorola,
AMD were central and stocks such as American Express, Abbot Labs, Bristol Myers
were not central. Central stocks are more likely to be indicative of the way other stocks
may react, since they influence others more than vice versa; hence, they may be leading
indicators of stock market movements. Computing centrality in various news domains is
useful to get a sense of what sources of news may be better-tracked than others.


2.3.14 Communities


News traffic may be analyzed to determine communities. Given a network graph’s
adjacency matrix, communities are easy to detect using any one of several well-known
algorithms. An excellent review of these algorithms is provided by Fortunato (2010).
A widely used library for graph analysis and community detection isigraph. This
may be accessed athttp://igraph.sourceforge.net/A sample of the ease of
use of theigraphlibrary usingRis as follows:


#CREATE GRAPH FROM ADJACENCY MATRIX
g = graph.adjacency(adjmat,mode="undirected",weighted=TRUE,
diag=FALSE)


#DETECT COMMUNITIES
wtc = walktrap.community(g)
comms = community.to.membership(g,wtc$merges,
steps=length(vc_list_connected)/4)print(comms)


#DETECT CLUSTERS
clus = clusters(g)
print(clus)


The sequence of commands initially creates the network graph from the adjacency
matrix (adjmat). It then executes the ‘‘walktrap’’ community detection algorithm to
find the communities that are then printed out. Theigraphpackage also allows for
finding clusters as needed.
A community is a cluster of nodes that have many connections between members of
the community but few connections outside the community. There are many algorithms
that exploit this working definition of a community. For instance, the walktrap algo-
rithm is a randomized one—it detects communities using a random walk on a network.
A random walk tends to be trapped in a community because of the number of links
between community nodes relative to links across communities. By keeping track of
regions of the network where the random walk is trapped, this algorithm is able to detect
communities. See the paper by the creators of the algorithm—Pons and Latapy (2006).
This is a very recent paper, and resulted in a large performance improvement over
existing algorithms.


News analytics: Framework, techniques, and metrics 61
Free download pdf