Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
CYBERINFRASTRUCTURE AND DATA ACQUISITION 227

227

7 CYBERINFRASTRUCTURE AND DATA ACQUISITION


7.1 CYBERINFRASTRUCTURE FOR 21ST CENTURY BIOLOGY

Twenty-first century biology seeks to integrate scientific understanding at multiple levels of bio-
logical abstraction, and it is holistic in the sense that it seeks an integrated understanding of biological
systems through studying the set of interactions between components. Because such an enormous,
data-intensive effort is necessarily and inherently distributed over multiple laboratories and investiga-
tors, an infrastructure is necessary that facilitates the integration of experimental data, enables collabo-
ration, and promotes communication among the various actors involved.


7.1.1 What Is Cyberinfrastructure?


Cyberinfrastructure for science and engineering is a term coined by the National Science Founda-
tion (NSF) to refer to distributed computer, information, and communication technologies and the
associated organizational facilities to support modern scientific and engineering research conducted on
a global scale. As articulated by the Atkins panel,^1 the technology substrate of cyberinfrastructure
involves the following:



  • High-end general-purpose computing centers that provide supercomputing capabilities to the com-
    munity at large. In the biological context, such capabilities might be used to undertake, for example,
    calculations to determine the three-dimensional structure of proteins given their genetic sequence. In
    some cases, these computing capabilities could be provided by local clusters of computers; in other
    cases, special-purpose hardware; and in still others, computing capabilities on demand from a comput-
    ing grid environment.

  • Data repositories that are well curated and that store and make available to all researchers large
    volumes and many types of biological data, both in raw form and as associated derived products. Such
    repositories must store data, of course, but they must also organize, manage, and document these


(^1) ”Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on
Cyberinfrastructure,” 2003, available at http://www.communitytechnology.org/nsf_ci_report/report.pdf.

Free download pdf