Social Media Mining: An Introduction

(Axel Boer) #1

P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-04 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:54


86 Network Models

using notationcorkin the literature. Since we frequently usekto denote
degree values, we usecto denote the expected degree of a random graph,

c=(n−1)p, (4.4)

or equivalently,

p=

c
n− 1

. (4.5)


Proposition 4.2.The expected number of edges in G(n,p)is

(n
2

)


p.

Proof.Following the same line of argument, because edges are selected
independently and we have a maximum of

(n
2

)


edges, the expected number
of edges is

(n
2

)


p.

Proposition 4.3. In a graph generated by G(n,p)model, the probability
of observing m edges is

P(|E|=m)=

((n
2

)


m

)


pm(1−p)(

n 2 )−m
, (4.6)

which is a binomial distribution.

Proof. medges are selected from the

(n
2

)


possible edges. These edges are
formed with probabilitypm, and other edges are not formed (to guarantee
the existence of onlymedges) with probability (1−p)(

n 2 )−m
.

Given these basic propositions, we next analyze how random graphs
evolve as we add edges to them.

4.2.1 Evolution of Random Graphs
In random graphs, when nodes form connections, after some time a large
fraction of nodes get connected (i.e., there is a path between any pair of
them). This large fraction forms aconnected component, commonly called
GIANT thelargest connected component or thegiant component. We can tune the
COMPONENT behavior of the random graph model by selecting the appropriatepvalue.
InG(n,p), whenp=0, the size of the largest connected component is 0
(no two pairs are connected), and whenp=1, the size isn(all pairs are
connected). Table4.3provides the size of the largest connected component
(slcin the figure) for random graphs with 10 nodes and differentpvalues.
Free download pdf