PROTEINS 39
though much is known. Predicting protein folding is an enormous challenge.
Most proteins contain dozens or hundreds of amino acids, so there is an astro-
nomical number of ways in which these might be arranged into a compact,
folded structure. While only a tiny fraction of the possible folds — perhaps 1000
to 10,000 — are found in natural proteins, the challenge is to deduce the best
fi t of a particular protein sequence to one of these folds. This is called the
protein - threading problem. Traditionally, the problem is tackled by assuming
that each amino acid prefers to be surrounded by others of a specifi c kind, and
then to look for the best compromise between the needs of all the amino acids.
Success using this approach depends on how well we know what the amino
acids prefer. Instead of trying to deduce this from physical and chemical prin-
ciples, Jayanth Banavar of Pennsylvania State University and colleagues use
a set of known protein structures to train a computer program to recognize
the preferences of each amino acid. Once trained, the program, a neural
network, can then predict unknown structures. These researchers have shown
that the learning - based method is more successful than one based on a priori
assumptions about amino acid preferences. The neural network correctly pre-
dicted the structures of 190 out of 213 test proteins, whereas the conventional
approach identifi ed only 137 structures correctly.^5 Some websites for more
information include: http://folding.stanford.edu/science.html and http://
predictioncenter.org/ (the Protein Structure Prediction Center at the Genome
Center at UCLA). Some recent review articles include those of reference 6.
The amino acid composition of a protein can be determined by cleaving all
peptide bonds and identifying the constituent amino acids. The sequence of a
given protein is determined by using various methods to cleave only selected
peptide bonds and then assembling the information to deduce the amino acid
sequence. Proteins with multiple subunits are usually broken down into indi-
vidual subunits (denatured with heat or chemical reagents) before composi-
tion and sequencing analyses are carried out. Characterization of proteins on
the basis of size and/or charge can be accomplished by a number of methods.
These methods are described Section 2.2.3.
2.2.3 Protein Sequencing and Proteomics, viii CONTENTS
Protein chemists have followed a basic strategy in sequencing proteins, as will
be described in the following section. Denaturation of multi - subunit proteins
through heat, changes in pH, or chemical reagents (urea, organic solvents,
acids, or bases) fi rst produces the constituent subunits. Disulfi de bonds are
broken by selective reduction or oxidation. Amino acid composition is deter-
mined by breaking the peptide bond through exhaustive enzymatic degrada-
tion or by hydrolysis by strong acid (6N HCl) or strong bases. Separation by
chromatography is followed by identifi cation and quantifi cation of the indi-
vidual amino acids by producing colored or fl uorescent products, the measured
intensity of which are proportional to the concentration of the amino acids.