Catalyzing Inquiry at the Interface of Computing and Biology

328 CATALYZING INQUIRY

biological data. This fact—that an understanding of biological systems depends on so many different
kinds of biological data, operating at so many different scales, and in such volume—suggests the
possibility that biological information and/or biological complexity might be notions with some formal
quantitative meaning.
How much information does a given biological system have? How should biological complexity be
conceptualized? Can we quantify or measure the amount of information or the degree of complexity
resident in, say, a cell, or perhaps even more challengingly, in an organelle, an ecosystem, or a species?
In what sense is an organism more complex than a cell or an ecosystem more complex than an indi-
vidual organism? Establishing an intellectually rigorous methodology through which such information
could be measured, capturing not only the raw scale of information needed to describe the constituent
elements of a system but also its complexity, could be a powerful tool for answering questions about the
nature of evolution, for quantifying the effects of aging and disease, and for evaluating the health of
ecologies or other complex systems.
Developing such a theory of biological information and complexity will be extraordinarily challeng-
ing, however. First, complexity and information exist at a vast range of orders of magnitude in size and
time, as well as in the vast range of organisms on Earth, and it is not at all clear that a single measure or
approach could be appropriate for all scales or creatures. Second, progress toward such a theory has
been made in fields traditionally separate from biology, including physics and computer science. Trans-
ferring knowledge and collaboration between biology and these fields is difficult at the best of times,
and doubly challenging when the research is at an early stage. Finally, such a theory may prove to be the
basis of a new organizing principle for biology, which may require a significant reorientation for
practicing biologists and biological theory.
Some building blocks for such a theory may already be available. These include information theory,
formulated by Claude Shannon in the mid-20th century for analyzing the performance of noisy commu-
nication channels; an extension of information theory, developed over the last few decades by theoreti-
cal physicists, that defines information in thermodynamic terms of energy and entropy; the body of
computational complexity theory, starting from Turing’s model of computation and extending it to
include classes of complexity based on the relative difficulty of families of algorithms; and complexity
theory (once called “chaos theory”), an interdisciplinary effort by physicists, mathematicians, and biolo-
gists to describe how apparently complex behavior can arise from the interaction of large numbers of
very simple components.
Measuring or even defining the complexity of a biological system—indeed, of any complex, dy-
namic system—has proven to be a difficult problem. Traditional measures of complexity that have been
developed to analyze and describe the products of human technological engineering are difficult to
apply or inappropriate for describing biological systems. For example, although both biological systems
and engineered systems often have degrees of redundancy (i.e., multiple instances of the same “compo-
nent” that serve the same function for purposes of reliability), biological systems also show many other
systems-level design behaviors that are rarely if ever found in engineered systems. Indeed, many such
behaviors would be considered poor design. For example, “degeneracy” in biological systems refers to
the property of having different systems produce the same activity. Similarly, in most biological sys-
tems, many different components contribute to global properties, a design that if included in a human-
engineered system would make it very difficult to understand.
Other attempts at measuring biological complexity include enumerating various macroscopic prop-
erties of an organism, such as the number of distinct parts, number of distinct cell types, number of
biological functions performed, and so forth. In practice this can be difficult (what is considered a
“distinct” part?) or inconclusive (is an organism with more cell types necessarily more complex?).
More conveniently, the entire DNA sequence of an organism’s genome can be analyzed. Since DNA
plays a major role in determining the structure and functions of an organism, one approach is to
consider the information content of the DNA string. Of course, biological knowledge is nowhere close
to actually being able to infer the totality of an organism merely from a DNA sequence, but the argu-

Catalyzing Inquiry at the Interface of Computing and Biology

328 CATALYZING INQUIRY

Get our desktop app

Company

Features

Documentation

Resources