Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1

6.1 STRING STRING (Search Tool for the Retrieval of Interacting Genes/
Proteins) [70] is technically a protein functional association net-
work. But since it stores the networks of most organisms, covers
most proteins, and has the largest number of functional associa-
tions, it has been widely used. What’s more is that each interaction
in STRING has a weight, called confidence score, which ranges
from 150 to 900. A higher score indicates proteins in this associa-
tion are more likely to have an actual association. The sources of
STRING include genomic context, high-throughput experiments,
conserved co-expression, and previous knowledge, such as database
or literacy. Such diverse sources of both direct physical and indirect
functional linkages between proteins make STRING the most
comprehensive network database.


6.2 KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) [71] stores
high-quality manually curated pathways. The regulations in KEGG
not only have directions but also have effects, such as activation or
deactivation. Due to its high quality, the biologists use it to gener-
ate hypothesis of certain genes and try to add them as the upstream
regulators or downstream targets of the known pathway. If their
findings are verified, these genes may be included in KEGG path-
way. Although the pathways stored in KEGG database are evolving
and the number of pathways is growing, KEGG only covers a very
small fraction of genes and their regulations. This limits its usage
for finding novel mechanism of disease. Advanced network analysis,
such as shortest path analysis and RWR, is difficult to be applied on
the KEGG network due to its sparseness of regulations.


6.3
ConsensusPathDB


The coverage and quality of network are difficult to balance. To find
the best trade-off of these contradict goals, many efforts have been
done. For example, ConsensusPathDB [72] collects 12 pathway
databases and finds the consensus interactions. It will certainly
increase the coverage, but such ensemble approaches still need a
lot of calculations and may introduce new errors. It is an open
question and needs more efforts, such as high-throughput interac-
tion screening technologies, to generate genome-wide network for
different tissues and diseases. The complete and accurate dynamic
condition-specific network is the ultimate goal for network studies.

7 Conclusions


The network is a great way to integrate complex omics big data and
decipher the underlying mechanisms of many multigene diseases,
such as cancers and diabetes. Here, we introduced the popular
network reconstruction and analysis methods and software. With
these tools, the regulatory pathways can be characterized, the key
driver genes or hub genes can be identified, and novel disease genes
can be inferred. Overall, the methods in this chapter are wonderful
tools for studying complex diseases and biological processes.

150 Guangyong Zheng and Tao Huang

Free download pdf