Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

form ofmust-linkandcannot linkconstraints, into the clustering procedure.^96 These
constraints indicate whether two data objects should be in the same cluster (must
link) or not (cannot link).
The Laplacian matrix of a molecular network is a mathematical representation
suitable for incorporating prior knowledge (constraints given in the network form)
into the clustering procedure: it encourages interacting molecules to belong to the
same cluster and noninteracting molecules to belong to different clusters (see
Sect. 3 for a detailed explanation).
In addition to molecular connectivity given in the form of Laplacian matrix, it is
possible to incorporate additional topological constraints into the clustering proce-
dure. Namely, a recent study shows that adding similarity between nodes’local
topology in molecular networks as constraints leads to more informative clusters.^97
This local topology is measured bygraphlet degree vectors(GDV). A GDV vector
is a 73-dimensional vector that describes the local topology (i.e., connectivity) of a
node in a network (see Fig.2(A)). A graphlet is a small connected induced subgraph
of a larger network. An induced subgraph on a node set is obtained by taking all the
nodes in that set and all the edges between these nodes that are present in the larger
network. In this way, it is possible to construct 30 unique graphlets containing 2 to
5 nodes. By taking into account the symmetries between nodes in a graphlet, there
are 73 automorphism orbits. For a particular node in a network, a generalization of
its degree, which is the number of edges that it touches, is the GDV. The coordi-
nates in the GDV vector represent the number of times the node touches each of the
73 orbits (see Fig.2(A), bottom panel). That is, for a noden, itsi-th coordinate of its
GDV vector denotes the number of times nodentouches orbit i. The GDV
represents the local structural properties of a node, and it can be used to compare
nodes based on their local topology within the network. The nodes in the network
with the similar local topology often do not interact. For example, nodes A and B in
Fig.2(B)are not directly connected, but they have the same local topology (i.e., the
same GDV vectors). Therefore, the information on topological similarity comple-
ments the information given by Laplacian matrix. The topological similarity,
defined as the distance between GDVs of two nodes,^98 can be used in the
constrained clustering, where, in addition to interacting proteins (genes) in the
molecular network, genes (proteins) with similar local topology should also be in
the same cluster.


generate clusters, whereas semi-supervised clustering algorithms use prior information about the
data to improve the clustering results. For instance, when performing clustering of genes, the
supervision is generally given as pairwise constraints that guide the clustering process; such
constrains are naturally given as molecular networks, that is, two connected genes in a molecular
network are considered to belong to the same cluster.


(^96) Zhu et al. ( 2005 ).
(^97) Gligorijevic ́et al. ( 2014 ).
(^98) Milenkovic ́and Pržulj ( 2008 ).
156 V. Gligorijevic ́and N. Pržulj

Free download pdf