Catalyzing Inquiry at the Interface of Computing and Biology

58 CATALYZING INQUIRY

These examples are drawn largely from the area of cell biology. The reason is not that these are the
only good examples of computational tools, but rather that a great deal of the activity in the field has
been the direct result of trying to make sense out of the genomic sequences that have been collected to
date. As noted in Chapter 2, the Human Genome Project—completed in draft in 2000—is arguably the
first large-scale project of 21st century biology in which the need for powerful information technology
was manifestly obvious. Since then, computational tools for the analysis of genomic data, and by
extension data associated with the cell, have proliferated wildly; thus, a large number of examples are
available from this domain.

4.2 Tools for Data Integration,

As noted in Chapter 3, data integration is perhaps the most critical problem facing researchers as
they approach biology in the 21st century.

Box 4.1 Tool Challenges for Computer Science Data Representation

Next-generation genome annotation system with accuracy equal to or exceeding the best human
predictions

Mechanism for multimodal representation of data

Analysis Tools

Scalable methods of comparing many genomes

Tools and analyses to determine how molecular complexes work within the cell

Techniques for inferring and analyzing regulatory and signaling networks

Tools to extract patterns in mass spectrometry datasets

Tools for semantic interoperability
Visualization

Tools to display networks and clusters at many levels of detail

Approaches for interpreting data streams and comparing high-throughput data with simulation output
Standards

Good software-engineering practices and standard definitions (e.g., a common component architecture)

Standard ontology and data-exchange format for encoding complex types of annotation

Databases

Large repository for microbial and ecological literature relevant to the “Genomes to Life” effort.

Big relational database derived by automatic generation of semantic metadata from the biological literature

Databases that support automated versioning and identification of data provenance

Long-term support of public sequence databases

SOURCE: U.S. Department of Energy, Report on the Computer Science Workshop for the Genomes to Life Program, Gaithersburg, MD, March 6-7, 2002; available at http://DOEGenomesToLife.org/compbio/.

(^2) Sections 4.2.1, 4.2.4, 4.2.6, and 4.2.8 embed excerpts from S.Y. Chung and J.C. Wooley, “Challenges Faced in the Integration of
Biological Information,” in Bioinformatics: Managing Scientific Data, Z. Lacroix and T. Critchlow, eds., Morgan Kaufmann, San
Francisco, CA, 2003. (Hereafter cited as Chung and Wooley, 2003.)

Catalyzing Inquiry at the Interface of Computing and Biology

58 CATALYZING INQUIRY

4.2 Tools for Data Integration,

Get our desktop app

Company

Features

Documentation

Resources