Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1
min
G 1  0 ,G 2  0

J¼ R 12 G 1 S 12 G 2 T


(^2)
Fþtr G
T
1 Θ^1 G^1





þtr G 2 TΘ 2 G 2




ð 11 Þ

Entries of the constraint matrices are positive for dissimilar data objects acting as
penalties on the objective function,J, and negative for similar data objects acting as
rewords that reduce the value of objective function. In many examples, prior
knowledge about genes (proteins) is given in the form of biological networks.
Therefore, the easiest way to incorporate this knowledge is through graph
Laplacians (see Sect.3.2). By replacingΘimatrices with the graph Laplacian
matrices,Li, the objective function for the penalized NMTF can be written in the
following way:


min
G 1  0 ,G 2  0

J¼ R 12 G 1 S 12 G 2 T


(^2)
Fþtr G
T
1 L^1 G^1





þtr G 2 TL 2 G 2




ð 12 Þ

wheretrdenotes the trace of a matrix and matricesL 1 (e.g., gene-gene networks)
andL 2 (e.g., GO DAG) are the Laplacian matrices of the first and the second data
set, respectively.
The natural generalization of this problem on the data sets containing data
objects of more than two data types is through block representation of data
matrices.^123 This formulation represents the basis of data integration using matrix


Fig. 4 Coclustering data (genes and GO terms) using nonnegative matrix tri-factorization
(NMTF). Matrix,R 122 ℜn^1 xn^2 , relating objects of type 1 (genes) and 2 (GO terms) is decomposed
into three matrix factors,G 12 ℜþn^1 xk^1 ,G 22 ℜþn^2 xk^2 andS 122 ℜk^1 xk^2 , wherek 1 n 1 , andk 2 n 2.
Entries in matrixG 1 are used to assign genes to clusters, while entries in matrixG 2 are used to
assign GO terms to clusters. For example, rowiof matrixG 2 denotes a GO term, while column
jdenotes a cluster index (note thatG 2 is transposed in the figure). The cluster membership of a GO
term is determined in the following way: GO termibelongs to clusterj if the maximum value in
columnjof matrixG 2 corresponds to rowi.Maximum values inG 2 are denoted bydarker blue
color. Therefore, GO terms with maximum values inG 2 that are located in the same column will
belong to the same cluster (here, GO terms belonging to the same cluster are shown in the same
color). Based on the GO term cluster association, unobserved relations between GO terms are
inferred


(^123) Wang et al. ( 2008 ).
Computational Methods for Integration of Biological Data 167

Free download pdf