Topology in Molecular Biology

(ff) #1

128 A.E. Kister et al.


There is both a local and a global point of view regarding the relation-
ship between the linear sequence of amino acids and the resulting three-
dimensional structure of protein. The former viewpoint postulates just a few
critical residues, some 10–20%, of the sequence play the most critical role in
determining the characteristics of a fold, while the latter considers all residues
in the sequence as crucial [5, 6].
The most commonly used methods of the global sequence comparison
(BLAST and FASTA [7–9]) match new sequences (queries) against all the
sequences in a database (target) and report each query-target pair that repre-
sents a statistically significant match. At present, some of the most powerful
approaches for protein classification are based on hidden Markov model [4–6].
However, it was shown that, as the sequence identities of related proteins
go below 30% identity, the chance of their relationship being detected these
methods becomes increasingly small. Thus, there is no doubt that the methods
described above have been very successful for protein classification; however
on the other hand, they all become less reliable as more distant, less homolo-
gous proteins are considered.
The local model received considerable support when Chothia and Lesk
showed, that rather different amino acids sequences share the same fold, i.e.,
same major secondary structure in the same arrangement and with same
chain topology [10]. In our recent study with Chothia and Lesk, we discussed
why structure changes slower than sequence in protein evolution [11]. For
related proteins, structure similarities arise in the course of their evolution
from a common ancestor, while for proteins with very low homology fold
similarity may be owed to physical and chemical factors. That favor certain
arrangements for secondary structure units and chain topology.
The considerable step ahead in our understanding of how the amino acid
sequence of proteins dictates its three-dimensional structure is a division of
amino acids in the sequence into hydrophobic interior and a surface of a
protein that is sufficiently hydrophilic. In our work we showed that residues
of the hydrophobic interior make the major contribution to the stability of
a protein [12]. Following George Orwell (Animal Farm), it can be concluded
that not all residues are equally significant in how they contribute to the
protein folding. Thus, the search of the key, conserved residue, i.e., residues
that are “more equal” than other residues in a protein, is the essential step in
solving the problem of the relationship between an amino acid sequence and
a geometric structure of proteins.
In this work, we suggest a new method of protein classification based on the
ideas of the local model. The main novelty of the method is in the identifica-
tion of the key residues or sequence determinants. The sequence determinants
serve as a basis for development of computer algorithms for protein classifica-
tion and structure/function prediction of genomic and amino acid sequences.
A direct corollary of the approach is that the complexity of protein sequence
search algorithms and 3D structure predictions can be dramatically reduced.

Free download pdf