Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

biological knowledge. For this purpose, we give a detailed description of three most
commonly used, state-of-the-art computational techniques for data integration
(Sect. 4 ).


2 Biological Problems


Here, we give a detailed description of biological problems of interests. We focus
on each biological problem separately and make a brief review of previous com-
putational methods used for its solving.


2.1 Protein Function Prediction


Protein function prediction(also referred to asprotein functional annotation) is one
of the most challenging problems in bioinformatics, the importance of which has
recently increased due to the progress in high-throughput techniques that generated
great amount of biological data. Experimental techniques for functional character-
ization of proteins generate information at a much slower rate than the rate at which
new sequences become available. To fill the gap between these experimental
techniques, various computational methods and tools for automatic protein anno-
tation have been devised. Methods for protein function prediction aim to assign
known molecular functions to poorly studied or newly identified proteins. By
protein functionwe mean a broad range of functions, such as gene regulation,
catalysis of biochemical reactions (enzymes), etc. To classify various aspects of
protein functions, special dictionaries of well-defined terms (ontologies) have been
developed. One of the commonly used ontologies is Gene Ontology (GO), which
provides hierarchical classification of protein functions in three main domains:
biological process,molecular function, andcellular component.^19 The hierarchical
classification of GO can be represented as a directed acyclic graph (DAG) (see
Sect.3.1for detailed explanation), where nodes represent protein function terms
(GO terms), while edges represent parent-child relationships between terms (rela-
tionships from general to specific GO terms).
Before the advent of high-throughput techniques and interaction data, sequence
comparison had been the leading method for protein function prediction.^20 The
method is based on a transfer of annotation from well-observed to partially
observed (or unknown) proteins if their sequences are similar. Recent methods
are based on large-scale protein interaction networks. These methods infer function
of proteins based on the functions of their directly interacting partners in the


(^19) Ashburner et al. ( 2000 ).
(^20) Whisstock and Lesk ( 2003 ).
Computational Methods for Integration of Biological Data 141

Free download pdf