Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
88 CATALYZING INQUIRY

These comments should not be taken to mean that the abstraction of DNA into a digital string is cost-
free. Although digital coding of DNA is central to the mechanisms of heredity, the nucleotide sequence
cannot deal with nondigital effects that also play important roles in protein synthesis and function.
Proteins do not necessarily bind only to one specific sequence; the overall proportions of AT versus CG in
a region affect its rate of transcription; and the state of methylation of a region of DNA is an important
mechanism for the epigenetic control of gene expression (and can indeed be inherited just as the digital
code can be inherited).^87 There are also numerous posttranslational modifications of proteins by processes
such as acetylation, glycosylation, and phosphorylation, which by definition are not inherent in the
genetic sequence.^88 The digital abstraction also cannot accommodate protein dynamics or kinetics. Be-
cause these nondigital properties can have important effects, ignoring them puts a limit on how far the
digital abstraction can support research related to gene finding and transcription regulation.
Last, DNA is often compared to a computer program that drives the functional behavior of a cell.
Although this analogy has some merit, it is not altogether accurate. Because DNA specifies which
proteins the cell must assemble, it is at least one step removed from the actual behavior of a cell, since
the proteins—not the DNA—that determine (or at least have a great influence on) cell behavior.


4.4.2 Proteins as Labeled Graphs,


A significant problem in molecular biology is the challenge of identifying meaningful substructural
similarities among proteins. Although proteins, like DNA, are composed of strings made from a se-
quence of a comparatively small selection of types of component molecules, unlike DNA, proteins can
exist in a huge variety of three-dimensional shapes. Such shapes can include helixes, sheets, and other
forms generally referred to as secondary or tertiary structure.
Since the structural details of a protein largely determine its functions and characteristics, determin-
ing a protein’s overall shape and identifying meaningful structural details is a critical element of protein
studies. Similar structure may imply similar functionality or receptivity to certain enzymes or other
molecules that operate on specific molecular geometry. However, even for proteins whose three-dimen-
sional shape has been experimentally determined through X-ray crystallography or nuclear magnetic
resonance, finding similarities can be difficult due to the extremely complex geometries and large
amount of data.
A rich and mature area of algorithm research involves the study of graphs, abstract representations
of networks of relationships. A graph consists of a set of nodes and a set of connections between nodes
called “edges.” In different types of graphs, edges may be one-way (a “directed graph”) or two-way
(“undirected”), or edges may also have “weights” representing the distance or cost of the connection.
For example, a graph might represent cities as nodes and the highways that connect them as edges
weighted by the distance between the pair of cities.
Graph theory has been applied profitably to the problem of identifying structural similarities among
proteins.^89 In this approach, a graph represents a protein, with each node representing a single amino
acid residue and labeled with the type of residue, and edges representing either peptide bonds or close
spatial proximity. Recent work in this area has combined graph theory, data mining, and information
theoretic techniques to efficiently identify such similarities.^90


(^87) For more on the influence of DNA methylation on genetic regulation, see R. Jaenisch and A. Bird, “Epigenetic Regulation of
Gene Expression: How the Genome Integrates Intrinsic and Environmental Signals,” Nature Genetics 33 (Suppl):245-254, 2003.
(^88) Indeed, some work even suggests that DNA methylation and histone acetylation may be connected. See J.R. Dobosy and E.U.
Selker, “Emerging Connections Between DNA Methylation and Histone Acetylation,” Cellular and Molecular Life Sciences 58(5-
6):721-727, 2001.
(^89) E.M. Mitchell, P.J. Artymiuk, D.W. Rice, and P. Willet, “Use of Techniques Derived from Graph Theory to Compare Second-
ary Structure Motifs in Proteins,” Journal of Molecular Biology 212(1):151-166, 1989.
(^90) J. Huan, W. Wang, A. Washington, J. Prins, R. Shah, and A. Tropsha, “Accurate Classification of Protein Structural Families
Using Coherent Subgraph Analysis,” Pacific Symposium on Biocomputing 2004:411-422, 2004.

Free download pdf