COMPUTATIONAL MODELING AND SIMULATION AS ENABLERS FOR BIOLOGICAL DISCOVERY 199
(^116) E.V. Koonin, N.D. Fedorova, J.D. Jackson, A.R. Jacobs, D.M. Krylov, K.S. Makarova, R. Mazumder, et al., “A Comprehen-
sive Evolutionary Classification of Proteins Encoded in Complete Eukaryotic Genomes,” Genome Biology 5(2):R7, 2004. (Cited in
Roth et al., “The Adaptive Evolution Database,” 2005.)
(^117) R. Rossnes, “Phylogenetic Reconstruction of Ancestral Character States for Gene Expression and mRNA Splicing Data,”
M.Sc. thesis, Universtiy of Bergen, Norway, 2004. (Cited in Roth et al., 2005.)
(^118) See, for example, G.F. Joyce, “The Antiquity of RNA-based Evolution,” Nature 418(6894):214-221, 2002.
(^119) M. Eigen, “Selforganization of Matter and the Evolution of Biological Macromolecules,” Naturwissenschaften 58(10):465-523,
1971.
(^120) P. Szabó, I Scheuring, T. Czaran, and E. Szathmary, “In Silico Simulations Reveal That Replicators with Limited Dispersal
Evolve Towards Higher Efficiency and Fidelity,” Nature 420(6913):340-343, 2002. A very helpful commentary on this article can
be found in G.F. Joyce, “Molecular Evolution: Booting Up Life,” Nature 420(6894):278–279, 2002. The discussion in Section
5.4.8.2.4 is based largely on this article.
(^121) W.K. Johnston, P.J. Unrau, M.S. Lawrence, M.E. Glasner, and D.P. Bartel, “RNA-catalyzed RNA Polymerization: Accurate
and General RNA-Templated Primer Extension,” Science 292(5520):1319-1325, 2001.
The TAED framework is expandable to incorporate other genomic-scale information in a phyloge-
netic context. This is important because coding sequence evolution (e.g., as reflected in the Ka/Ks ratio)
is only one part of the molecular evolution of genomes driving phenotypic divergence. Changes in gene
content^116 and phylogenetic reconstructions of changes in gene expression and alternative splicing
data^117 can indicate where other significant lineage-specific changes have occurred. Altogether, phylo-
genetic indexing of genomic data presents a powerful approach to understanding the evolution of
function in genomes.
5.4.8.2.4 The Emergence of Complex GenomesHow did life get started on Earth? Today, life is based on
DNA genomes and protein enzymes. However, biological evidence exists to suggest that in a previous
era, life was based on RNA, in the sense that genetic information was contained in RNA sequences and
phenotypes were expressed as catalytic properties of RNA.^118
An interesting and profound issue is therefore to understand the transition from the RNA to the
DNA world, one element of which is the fact that DNA genomes are complex structures. In 1971, Eigen
found an explicit relationship between the size of a stable genome and the error rate inherent in its
replication, specifically that the size of the genome was inversely proportional to the per-nucleotide
replication error rate.^119 Thus, for a genome of length L to be reasonably stable over successive genera-
tions, the maximum tolerable error rate in replication could be no more than 1/L per nucleotide.
However, more precise replication mechanisms tend to be more complex. Given that the replication
mechanism must itself be represented in the genome, the puzzle is that a precise replication mecha-
nism is needed to maintain a complex genome, but a complex genome is required to encode such a
mechanism.
The only possible answer to this puzzle is that complex genomes evolved from simpler ones. Szabó
et al. investigated this possibility through computer simulations.^120 They constructed a population of
digital genomes subject to evolutionary forces and found that under a certain set of circumstances, both
genome size and replication fidelity increased with the run time of the simulation. However, such
behavior was dependent on the existence of a sufficient amount of spatial isolation of the evolving
population. In the absence of separation (i.e., in the limit of very rapid diffusion of genomes across the
two-dimensional surface to which they were confined), genome complexity and replication fidelity
were both limited. However, if diffusion is slow (i.e., the characteristic time constant of diffusion is less
than the time scale of replication), both complexity and fidelity increase.
In addition, Johnston et al. have synthesized in the laboratory a catalytic RNA molecule that con-
tains about 200 nucleotides and synthesizes RNA molecules of up to 14 nucleotides, with an error rate
of about 3 percent per residue.^121 This laboratory demonstration, coupled with the computational
finding described above, suggest that a small RNA genome that operates as an RNA replicase with