Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

performance and concluded that the genetic interaction network contributes the
most to the predictive performance of the model, despite being the smallest.
A similar study was done to reconstruct GO and predict new GO term associ-
ations and gene annotations inSaccharomyces cerevisiae.^126 The authors used
matrix factorization-based data integration to combine data on gene annotations
(the gene-GO term relation matrix), GO semantic structure (the term-term con-
straint matrix), and four types of molecular networks (PPI, genetic interaction, gene
coexpression and the integrated functional linkage network, YeastNet) integrated as
constraints (see Fig. 4 for a schematic illustration of the procedure). These con-
straints guide the clustering procedure forcing two interacting genes (proteins) to
belong to the same cluster. From the clusters of GO terms, they were able to infer
new associations between GO terms that are not present in GO. To further improve
the accuracy of their predictions, the authors extended the clustering procedure by
also taking the noninteracting genes that have significant topological similarity as
constraints (measured by computing GDV similarity, see Sect.3.2for a detailed
explanation). This is the first study that explored the effects of integration of GDV
similarity as additional topological constraints. The authors reported a great accu-
racy of their method in an evaluation of the obtained GO term associations against
GO. Namely, they reported that 96 % of their reconstructed GO term associations
overlap with the known GO term associations. This study indicated that all of GO
could be reconstructed solely from the topology of molecular networks.
Matrix completion property of the nonnegative matrix factorization technique
has demonstrated a great potential in the prediction of new gene-disease rela-
tions.^127 However, most of the previous matrix completion techniques are
transductive, that is, they rely on a set of genes already linked to the query disease
and they fail to make predictions for a new disease with unknown gene associations.
In other words, these methods can only predict new candidate genes for query
diseases with previously known gene associations. To overcome this limitation, a
recent study proposes a new matrix completion method that isinductive, that is, it
can be applied for inducing new candidate genes for diseases with previously
unknown gene associations.^128 A framework that integrates multiple types of data
(functional annotations, pathways and ontologies, sequence data, protein interac-
tion data, etc.) to construct the feature vectors of genes and diseases is proposed (see
Natarajan and Dhillon ( 2014 )). These features are incorporated into the inductive
matrix completion approach to learn existing and predict new gene-disease associ-
ations. In particular, they show a great performance of their approach on gene-
disease associations from OMIM database. Moreover, they demonstrate a great
advantage of their approach over network-based approaches for disease gene
prioritization. Namely, network-based methods cannot be used for prioritization
of genes that are not connected to any other gene in the network, while inductive


(^126) Gligorijevic ́et al. ( 2014 ).
(^127) Hwang et al. ( 2012 ).
(^128) Natarajan and Dhillon ( 2014 ).
170 V. Gligorijevic ́and N. Pržulj

Free download pdf