Social Media Mining: An Introduction

(Axel Boer) #1

P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-06 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:15


6.1 Community Detection 149

Figure 6.5. Maximalk-plexes fork= 1 ,2, and 3.

will be pruned, which will reduce the computation drastically. This pruning
works for both directed and undirected graphs.
Even with pruning, there are intrinsic properties with cliques that make
them a less desirable means for finding communities. Cliques are rarely
observed in the real world. For instance, consider a clique of 1,000 nodes.
This subgraph has^999 × 21000 =499,500 edges. A single edge removal from
this many edges results in a subgraph that is no longer a clique. That
represents less than 0.0002% of the edges, which makes finding cliques an
unlikely and challenging task.
In practice, to overcome this challenge, we can either relax the clique
structure or use cliques as a seed or core of a community.

Relaxing cliques.A well-known clique relaxation that comes from sociol-
ogyisthek-plex concept. In a clique of sizek, all nodes have the degree k-PLEX
ofk−1; however, in ak-plex, all nodes have a minimum degree that is not
necessarilyk−1 (as opposed to cliques of sizek). For a set of verticesV,
the structure is called ak-plexifwehave

dv≥|V|−k,∀v∈V, (6.1)

wheredvis the degree ofvin the induced subgraph (i.e., the number of
nodes from the setVthat are connected tov).
Clearly, a clique of sizekis a 1-plex. Askgets larger in ak-plex, the
structure gets increasingly relaxed, because we can remove more edges
from the clique structure. Finding the maximumk-plex in a graph still
tends to be NP-hard, but in practice, finding it is relatively easy due to
smaller search space. Figure6.5shows maximalk-plexes fork=1, 2, and


  1. Ak-plex is maximal if it is not contained in a larger (i.e., with more
    nodes).


Using cliques as a seed of a community.When using cliques as a seed
or core of a community, we assume communities are formed from a set of
cliques (small or large) in addition to edges that connect these cliques. A
well-known algorithm in this area is the clique percolation method (CPM)

CLIQUE
PERCOLATION
METHOD
(CPM)
Free download pdf