Social Media Mining: An Introduction

(Axel Boer) #1

P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-06 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:15


176 Community Analysis

clustering. Hierarchical clustering algorithms are usually variants of single
link, average link, or complete link algorithms [Jain and Dubes, 1999].
In hierarchical clustering, COBWEB [Fisher, 1987] and CHAMELEON
[Karypis et al., 1999] are two well-known algorithms.
In group-based community detection, latent space models [Handcock
et al., 2007;Hoff et al., 2002] are also very popular, but are not discussed in
this chapter. In addition to the topics discussed in this chapter, community
detection can also be performed for networks with multiple types of inter-
action (edges) [Tang and Liu, 2009;Tang et al., 2012]. We also restricted
our discussion to community detection algorithms that use graph infor-
mation. One can also perform community detection based on the content
that individuals share on social media. For instance, using tagging relations
(i.e., individuals who shared the same tag) [Wang et al., 2010], instead
of connections between users, one can discover overlapping communities,
which provides a natural summarization of the interests of the identified
communities.
In network evolution analysis, network segmentation is discussed in
[Kumar et al., 2010]. Segment-based clustering [Sun et al., 2007] is another
method not covered in this chapter.
NMI was first introduced in [Strehl et al., 2002] and in terms of clus-
tering quality measures, the Davies-Bouldin [Davies and Bouldin, 1979]
measure, Rand index [Rand, 1971], C-index [Dunn, 1974], Silhouette index
[Rousseeuw, 1987], and Goodman-Kruskal index [Goodman and Kruskal,
1954 ] can be used.

6.6 Exercises


  1. Provide an example to illustrate how community detection can be sub-
    jective.


Community Detection


  1. Given a complete graphKn, how many nodes will the clique percolation
    method generate for the clique graph for valuek? How many edges
    will it generate?

  2. Find allk-cliques,k-clubs, andk-clans in a complete graph of size 4.

  3. For a complete graph of sizen,isitm-connected? What possible values
    canmtake?

  4. Why is the smallest eigenvector meaningless when using an unnormal-
    ized laplacian matrix?

Free download pdf