Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

2.4 Prioritization of Disease Genes


Prioritization of disease-causing genes is a problem of great importance in human
health and medical care. It deals with the identification of genes involved in a
specific disease and providing a better understanding of gene function. Currently,
there is a small number of available data repositories that store the information on
diseases and their related genes (for a review, see Sect.3.1). However, many
diseases have usually a small number of associated genes, and many related
genes have not yet been uncovered. Experimental methods for providing causal
links between genes and diseases are expensive and time consuming.^33 Therefore,
computational methods for prioritization of genes prior to experimental testing
drastically reduce costs for experimental testings. Most of the computational
methods have similar output: they assign a score to a gene, which is a measure of
the likelihood for that gene to be involved in a particular disease. These scores can
further be used to rank genes and narrow down the potential set of disease-causing
genes to be tested experimentally.
Most computational methods for gene prioritization are network based. They are
motivated by the fact that most genes involved in clinically similar diseases are
located in the same part of the PPI network.^34 Other approaches are based on
information flow. They explore topological closeness between proteins in the PPI
network in terms of cumulative strength of all possible paths between candidate
proteins and proteins involved in clinically similar diseases. These approaches
include a random walk with restart^35 and network propagation.^36 They start with
some initial association scores between proteins and disease clinically similar to the
disease of interests. Next, they propagate this information across the PPI network to
compute new scores for all proteins in the network. These scores are further used to
rank proteins and to determine prioritized set of proteins for the disease of interest.
It has been shown that methods that take a multiplicity of paths between proteins
into account have a higher accuracy than methods that only consider direct
interactions.^37
Most of these methods do not exploit the PPI network to its full potential.
Moreover, very few of them combine different molecular networks and other
types of data to prioritize candidate disease genes. For example, a recent study
integrates disease phenotypic similarity with omics data (PPI network, gene
sequence similarity, gene expression patterns, gene ontology annotations, and
gene pathway membership) into a framework called BRIDGE to prioritize candi-
date genes.^38 This framework is able to correctly rank 60 % of known disease genes


(^33) Bromberg ( 2013 ).
(^34) Goh et al. ( 2007 ).
(^35) K€ohler et al. ( 2008 ).
(^36) Vanunu et al. ( 2010 ).
(^37) Bebek et al. ( 2012 ).
(^38) Chen et al. ( 2013 ).
Computational Methods for Integration of Biological Data 145

Free download pdf