Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
106 CATALYZING INQUIRY

Beyond studies of protein structure is the problem of describing a solvent environment (such as
water) and its influence on a protein’s conformational behavior. The importance of hydration in protein
stability and folding is widely accepted. Models are needed to incorporate the effects of solvents in
protein three-dimensional structure.


4.4.10 Protein Identification and Quantification from Mass Spectrometry,


A second important problem in proteomics is protein identification and quantification. That is,
given a particular biological sample, what specific proteins are present and in what quantities? This
problem is at the heart of studying protein–protein interactions at proteomic scale, mapping various
organelles, and generating quantitative protein profiles from diverse species. Making inferences about
protein identification and abundance in biological samples is often challenging, because cellular
proteomes are highly complex and because the proteome generally involves many proteins at relatively
low abundances. Thus, highly sensitive analytical techniques are necessary.
Today, techniques based on mass spectrometry increasingly fill this need. The mass spectrometer
works on a biological sample in ionized gaseous form. A mass analyzer measures the mass-to-charge
ratio (m/z) of the ionized analytes, and a detector measures the number of ions at each m/z value. In
the simplest case, a procedure known as peptide mass fingerprinting (PMF) is used. PMF is based on the
fact that a protein is composed of multiple peptide groups, and identification of the complete set of
peptides will with high probability characterize the protein in question. After enzymatically breaking
up the protein into its constituent peptides, the mass spectrometer is used to identify individual pep-
tides, each of which has a known mass. The premise of PMF is that only a very few (one in the ideal case)
proteins will correspond to any particular set of peptides, and protein identification is effected by
finding the best fit of the observed peptide masses to the calculated masses derived from, say, a se-
quence database. Of course, the “best fit” is an algorithmic issue, and a variety of approaches have been
taken to determine the most appropriate algorithms.
The applicability of PMF is limited when samples are complex (that is, when they involve large
numbers of proteins at low abundances). The reason is that only a small fraction of the constituent
peptides are typically ionized, and those that are observed are usually from the dominant proteins in
the mixture. Thus, for complex samples, multiple (tandem) stages of mass spectrometry may be neces-
sary. In a typical procedure, peptides from a database are scored on the likelihood of their generating a
tandem mass spectrum, and the top scoring peptide is chosen. This computational approach has shown
great success, and contributed to the industrialization of proteomics.
However, much remains to be done. First, the generation of the spectrum is a stochastic process
governed by the peptide composition, and the mass spectrometer. By mining data to understand these
fragmentation propensities, scoring and identification can be further improved. Second, if the peptide is
not in the database, de novo or homology-based methods must be developed for identification. Many
proteins are post-translationally modified, with the modifications changing the mass composition.
Enumeration and scoring of all modifications leads to a combinatorial explosion that must be addressed
using novel computational techniques. It is fair to say that computation will play an important role in
the success of mass spectrometry as the tool of choice for proteomics.
Mass spectrometry is also coming into its own for protein expression studies. The major problem here
is that the intensity of a peak depends not only on the peptide abundance, but also on the physico-
chemical properties of the peptide. This makes it difficult to measure expression levels directly. However,
relative abundance can be measured using the proven technique of stable-isotope dilution. This method
makes use of the facts that pairs of chemically identical analytes of different stable-isotope composition
can be differentiated in a mass spectrometer owing to their mass difference, and that the ratio of signal
intensities for such analyte pairs accurately indicates the abundance ratio for the two analytes.
This approach shows great promise. However, computational methods are needed to correlate data
across different experiments. If the data were produced using liquid chromatography coupled with

Free download pdf