Catalyzing Inquiry at the Interface of Computing and Biology

364 CATALYZING INQUIRY

Box 10.5 Some Examples of Oversimplified and/or Misleading Computational and Mathematical Models in Biology

The Turing reaction-diffusion theory for pattern formation in developmental biology—first suggested by Tur-
ing in 1952, and largely dormant until the mid-1970s, this theory, based on an activator-inhibitor system, be-
came a focus of partial differential equations research. Initially, attempts were made to show that diffusion and
reaction of the activator-inhibitor type are responsible for the development of real structures in real embryos
(stripes or spots, positions of limbs and digits, etc.) However, later work has shown that the biological solution to
the pattern formation problem is inelegant and “kludgy”, with many “redundant” or “inefficient” parts.^1
•A senior computer scientist faced the issue of how one might infer the structure of a genetic regulatory
network from data on the presence or absence of transcription factors. In a cell, a set of genes interact to
produce a protein—and the transcription factors (themselves proteins) influence the rate at which that protein
is produced. His initial model of this network was a Boolean circuit, in which the presence or absence of
certain factors led to the production of the protein. A typical experimental procedure in a biology lab to probe
the nature of this circuit is to observe its behavior by inhibiting the production of some transcription factor and
to observe whether or not the protein is produced. The analogous action in the Boolean circuit would be
cutting a wire in that circuit. However, this simple analogy failed to model the actual behavior of the biolog-
ical system because, in many cases, the inhibition of one transcription factor results in another set of proteins
that do the same job. Thus, the notion of simple perturbation experiments that can be viewed as analogous to
just snipping a wire in a logic circuit is obvious for computer scientists—but turns out to be not particularly
relevant to this particular phenomenon.

The problem of genome sequence assembly involves piecing together a large number of short sequences
(fragments) into the correct master sequence. The initial computer scientist formulation of this problem was to
find the shortest sequence that would contain a given set of sequences as a consecutive piece. But this
formulation of the problem was completely wrong for two reasons. First, the available information on the
fragments is sometime erroneous—that is, the data might indicate that a fragment would have a certain base
at a given location, but in reality it would have a different base at that location. Second, DNA molecules have
a great deal of repeated structure (i.e., the same sequence is typically found multiple times). Thus, the shortest
sequence is not biologically plausible because that repeated structure is ignored.

Amino acids are represented by codons (i.e., triplets of nucleotide bases). Because there are 4 nucleotides,
the number of possible codons is 4^3 , or 64. But for a long time, only 20 amino acids were known that occur
in nature. It turns out that by assuming that the codons overlapped each other and requiring that the coding be
unambiguous, only 20 codons are possible. Because of this match, a natural assumption was that an overlap-
ping code was operative in DNA coding. However, experimental data dispelled this notion, indicating instead
that multiple codons can represent the same amino acid and further that the codons were not overlapping.

(^1) See, for example, G. von Dassow, E. Meir, E.M. Munro, and G.M. Odell, “The Segment Polarity Network Is a Robust Developmental
Module,” Nature 406(6792):188-192, 2000. At the same time, the reaction-diffusion approach appears to have nontrivial utility in explain-
ing other biological phenomena, such as certain aspects of microtubule organization (C. Papaseit, N. Pochon, and J. Tabony, “Microtubule
Self-organization Is Gravity-dependent,” Proceedings of the National Academy of Sciences 97(15):8364-8368, 2000).
models, at least as represented by mathematics-based theory and computational models. For example,
theoretical biology has a very different status within biology and has often been a poor stepchild to
mainstream biology. Results from theoretical biology are often irrelevant to specific biological systems
such as a particular species, and even the simplest biological organism is so complex as to render
virtually impossible a theoretical analysis based on first principles. Indeed, most biologists have a long-
ingrained suspicion of theoretical models that they regard as vastly oversimplified (i.e., almost all of
them) and are skeptical of any purported insights that emerge from such models. (Box 10.5 provides
some examples of misleading computational and mathematical models of biological phenomena.)

Catalyzing Inquiry at the Interface of Computing and Biology

364 CATALYZING INQUIRY

Get our desktop app

Company

Features

Documentation

Resources