Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1

  • EXECUTIVE SUMMARY

  • 1 INTRODUCTION

    • 1.1 Excitement at the Interface of Computing and Biology,

    • 1.2 Perspectives on the BioComp Interface,

      • 1.2.1 From the Biology Side,

      • 1.2.2 From the Computing Side,

      • 1.2.3 The Role of Organization and Culture,



    • 1.3 Imagine What’s Next,

    • 1.4 Some Relevant History in Building the Interface,

      • 1.4.1 The Human Genome Project,

      • 1.4.2 The Computing-to-Biology Interface,

      • 1.4.3 The Biology-to-Computing Interface,



    • 1.5 Background, Organization, and Approach of This Report,



  • 2 21st CENTURY BIOLOGY

    • 2.1 What Kind of Science?,

      • 2.1.1 The Roots of Biological Culture,

      • 2.1.2 Molecular Biology and the Biochemical Basis of Life,

      • 2.1.3 Biological Components and Processes in Context, and Biological Complexity,



    • 2.2 Toward a Biology of the 21st Century,

    • 2.3 Roles for Computing and Information Technology in Biology,

      • 2.3.1 Biology as an Information Science,

      • 2.3.2 Computational Tools,

      • 2.3.3 Computational Models,

      • 2.3.4 A Computational Perspective on Biology,

      • 2.3.5 Cyberinfrastructure and Data Acquisition,



    • 2.4 Challenges to Biological Epistemology,



  • 3 ON THE NATURE OF BIOLOGICAL DATA xiv CONTENTS

    • 3.1 Data Heterogeneity,

    • 3.2 Data in High Volume,

    • 3.3 Data Accuracy and Consistency,

    • 3.4 Data Organization,

    • 3.5 Data Sharing,

    • 3.6 Data Integration,

    • 3.7 Data Curation and Provenance,



  • 4 COMPUTATIONAL TOOLS

    • 4.1 The Role of Computational Tools,

    • 4.2 Tools for Data Integration,

      • 4.2.1 Desiderata,

      • 4.2.2 Data Standards,

      • 4.2.3 Data Normalization,

      • 4.2.4 Data Warehousing,

      • 4.2.5 Data Federation,

      • 4.2.6 Data Mediators/Middleware,

      • 4.2.7 Databases as Models,

      • 4.2.8 Ontologies,

        • 4.2.8.1 Ontologies for Common Terminology and Descriptions,

        • 4.2.8.2 Ontologies for Automated Reasoning,



      • 4.2.9 Annotations and Metadata,

      • 4.2.10 A Case Study: The Cell Centered Database,

      • 4.2.11 A Case Study: Ecological and Evolutionary Databases,



    • 4.3 Data Presentation,

      • 4.3.1 Graphical Interfaces,

      • 4.3.2 Tangible Physical Interfaces,

      • 4.3.3 Automated Literature Searching,



    • 4.4 Algorithms for Operating on Biological Data,

      • 4.4.1 Preliminaries: DNA Sequence as a Digital String,

      • 4.4.2 Proteins as Labeled Graphs,

      • 4.4.3 Algorithms and Voluminous Datasets,

      • 4.4.4 Gene Recognition,

      • 4.4.5 Sequence Alignment and Evolutionary Relationships,

      • 4.4.6 Mapping Genetic Variation Within a Species,

      • 4.4.7 Analysis of Gene Expression Data,

      • 4.4.8 Data Mining and Discovery,

        • 4.4.8.1 The First Known Biological Discovery from Mining Databases,

          • Integration for Functional Analysis of Proteins, 4.4.8.2 A Contemporary Example: Protein Family Classification and Data





      • 4.4.9 Determination of Three-dimensional Protein Structure,

      • 4.4.10 Protein Identification and Quantification from Mass Spectrometry,

      • 4.4.11 Pharmacological Screening of Potential Drug Compounds,

      • 4.4.12 Algorithms Related to Imaging,

        • 4.4.12.1 Image Rendering,

        • 4.4.12.2 Image Segmentation,

        • 4.4.12.3 Image Registration,

        • 4.4.12.4 Image Classification,



      • 4.5 Developing Computational Tools,





  • BIOLOGICAL DISCOVERY 5 COMPUTATIONAL MODELING AND SIMULATION AS ENABLERS FOR

  • 5.1 On Models in Biology,

  • 5.2 Why Biological Models Can Be Useful,

    • 5.2.1 Models Provide a Coherent Framework for Interpreting Data,

    • 5.2.2 Models Highlight Basic Concepts of Wide Applicability,

    • 5.2.3 Models Uncover New Phenomena or Concepts to Explore,

    • 5.2.4 Models Identify Key Factors or Components of a System,

    • 5.2.5 Models Can Link Levels of Detail (Individual to Population),

    • 5.2.6 Models Enable the Formalization of Intuitive Understandings,

    • 5.2.7 Models Can Be Used as a Tool for Helping to Screen Unpromising Hypotheses,

    • 5.2.8 Models Inform Experimental Design,

    • 5.2.9 Models Can Predict Variables Inaccessible to Measurement,

    • 5.2.10 Models Can Link What Is Known to What Is Yet Unknown,

    • 5.2.11 Models Can Be Used to Generate Accurate Quantitative Predictions,

    • 5.2.12 Models Expand the Range of Questions That Can Meaningfully Be Asked,



  • 5.3 Types of Models,

    • 5.3.1 From Qualitative Model to Computational Simulation,

    • 5.3.2 Hybrid Models,

    • 5.3.3 Multiscale Models,

    • 5.3.4 Model Comparison and Evaluation,



  • 5.4 Modeling and Simulation in Action,

    • 5.4.1 Molecular and Structural Biology,

      • 5.4.1.1 Predicting Complex Protein Structures,

      • 5.4.1.2 A Method to Discern a Functional Class of Proteins,

      • 5.4.1.3 Molecular Docking,

        • Structural Sites in Protein Structures, 5.4.1.4 Computational Analysis and Recognition of Functional and





    • 5.4.2 Cell Biology and Physiology,

      • 5.4.2.1 Cellular Modeling and Simulation Efforts,

      • 5.4.2.2 Cell Cycle Regulation,

        • Human Pathophysiology of Red Blood Cells, 5.4.2.3 A Computational Model to Determine the Effects of SNPs in



      • 5.4.2.4 Spatial Inhomogeneities in Cellular Development,

        • Stability, 5.4.2.4.1 Unraveling the Physical Basis of Microtubule Structure and

        • 5.4.2.4.2 The Movement of Listeria Bacteria,

          • Intracellular Signaling, 5.4.2.4.3 Morphological Control of Spatiotemporal Patterns of







    • 5.4.3 Genetic Regulation,

      • 5.4.3.1 Cis-regulation of Transcription Activity as Process Control Computing,

      • 5.4.3.2 Genetic Regulatory Networks as Finite-state Automata,

      • 5.4.3.3 Genetic Regulation as Circuits,

      • 5.4.3.4 Combinatorial Synthesis of Genetic Networks,

        • Biological Network Information, 5.4.3.5 Identifying Systems Responses by Combining Experimental Data with





    • 5.4.4 Organ Physiology,

      • 5.4.4.1 Multiscale Physiological Modeling,

      • 5.4.4.2 Hematology (Leukemia),

        • 5.4.4.3 Immunology, xvi CATALYZING INQUIRY

        • 5.4.4.4 The Heart,



      • 5.4.5 Neuroscience,

        • 5.4.5.1 The Broad Landscape of Computational Neuroscience,

        • 5.4.5.2 Large-scale Neural Modeling,

        • 5.4.5.3 Muscular Control,

        • 5.4.5.4 Synaptic Transmission,

        • 5.4.5.5 Neuropsychiatry,



      • 5.4.6 Virology,

      • 5.4.7 Epidemiology,

      • 5.4.8 Evolution and Ecology,

        • 5.4.8.1 Commonalities Between Evolution and Ecology,

        • 5.4.8.2 Examples from Evolution,

          • 5.4.8.2.1 Reconstruction of the Saccharomyces Phylogenetic Tree,

          • 5.4.8.2.2 Modeling of Myxomatosis Evolution in Australia,

          • 5.4.8.2.3 The Evolution of Proteins,

          • 5.4.8.2.4 The Emergence of Complex Genomes,



        • 5.4.8.3 Examples from Ecology,

          • 5.4.8.3.1 Impact of Spatial Distribution in Ecosystems,

          • 5.4.8.3.2 Forest Dynamics,







    • 5.5 Technical Challenges Related to Modeling,



  • 6A COMPUTATIONAL AND ENGINEERING VIEW OF BIOLOGY

    • 6.1 Biological Information Processing,

    • 6.2 An Engineering Perspective on Biological Organisms,

      • 6.2.1 Biological Organisms as Engineered Entities,

      • 6.2.2 Biology as Reverse Engineering,

      • 6.2.3 Modularity in Biological Entities,

      • 6.2.4 Robustness in Biological Entities,

      • 6.2.5 Noise in Biological Phenomena,



    • 6.3 A Computational Metaphor for Biology,



  • 7 CYBERINFRASTRUCTURE AND DATA ACQUISITION

    • 7.1 Cyberinfrastructure for 21st Century Biology,

      • 7.1.1 What Is Cyberinfrastructure?

      • 7.1.2 Why Is Cyberinfrastructure Relevant?

      • 7.1.3 The Role of High-performance computing,

      • 7.1.4 The Role of Networking,

      • 7.1.5 An Example of Using Cyberinfrastructure for Neuroscience Research,



    • 7.2 Data Acquisition and Laboratory Automation,

      • 7.2.1 Today’s Technologies for Data Acquisition,

      • 7.2.2 Examples of Future Technologies,

      • 7.2.3 Future Challenges,





  • 8 BIOLOGICAL INSPIRATION FOR COMPUTING

    • 8.1 The Impact of Biology on Computing,

      • 8.1.1 Biology and Computing: Promise and Skepticism,

      • 8.1.2 The Meaning of Biological Inspiration,

      • 8.1.3 Multiple Roles: Biology for Computing Insight,





  • 8.2 Examples of Biology as a Source of Principles for Computing, CONTENTS xvii

    • 8.2.1 Swarm Intelligence and Particle Swarm Optimization,

    • 8.2.2 Robotics 1: The Subsumption Architecture,

    • 8.2.3 Robotics 2: Bacterium-inspired Chemotaxis in Robots,

    • 8.2.4 Self-Healing Systems,

    • 8.2.5 Immunology and Computer Security,

      • 8.2.5.1 Why Immunology Might Be Relevant,

      • 8.2.5.2 Some Possible Applications of Immunology-based Computer Security,

      • 8.2.5.3 Immunological Design Principles for Computer Security,

      • 8.2.5.4 An Example: Immunology and Intruder Detection,

      • 8.2.5.5 Interesting Questions and Challenges,

        • 8.2.5.5.1 Definition of Self,

        • 8.2.5.5.2 More Immunological Mechanisms,



      • 8.2.5.6 Some Possible Difficulties with an Immunological Approach,



    • 8.2.6 Amorphous Computing,



  • 8.3 Biology as Implementer of Mechanisms for Computing,

    • 8.3.1 Evolutionary Computation,

      • 8.3.1.1 What Is Evolutionary Computation?

      • 8.3.1.2 Suitability of Problems for Evolutionary Computation,

      • 8.3.1.3 Correctness of a Solution,

      • 8.3.1.4 Solution Representation,

      • 8.3.1.5 Selection of Primitives,

      • 8.3.1.6 More Evolutionary Mechanisms,

        • 8.3.1.6.1 Coevolution,

        • 8.3.1.6.2 Development,



      • 8.3.1.7 Behavior of Evolutionary Processes,



    • 8.3.2 Robotics 3: Energy and Compliance Management,

    • 8.3.3 Neuroscience and Computing,

      • 8.3.3.1 Neuroscience and Architecture in Broad Strokes,

      • 8.3.3.2 Neural Networks,

      • 8.3.3.3 Neurally Inspired Sensors,



    • 8.3.4 Ant Algorithms,

      • 8.3.4.1 Ant Colony Optimization,

      • 8.3.4.2 Other Ant Algorithms,





  • 8.4 Biology as Physical Substrate for Computing,

    • 8.4.1 Biomolecular Computing,

      • 8.4.1.1 Description,

      • 8.4.1.2 Potential Application Domains,

      • 8.4.1.3 Challenges,

      • 8.4.1.4 Future Directions,



    • 8.4.2 Synthetic Biology,

      • 8.4.2.1 An Engineering Approach to Building Living Systems,

      • 8.4.2.2 Cellular Logic Gates,

      • 8.4.2.3 Broader Views of Synthetic Biology,

      • 8.4.2.4 Applications,

      • 8.4.2.5 Challenges,



    • 8.4.3 Nanofabrication and DNA Self-Assembly,

      • 8.4.3.1 Rationale,

      • 8.4.3.2 Applications,

        • 8.4.3.3 Prospects, xviii CONTENTS

        • 8.4.3.4 Hybrid Systems,





    • COMPUTING AND BIOLOGY 9 ILLUSTRATIVE PROBLEM DOMAINS AT THE INTERFACE OF

    • 9.1 Why Problem-focused Research?

    • 9.2 Cellular and Organismal Modeling,

    • 9.3 A Synthetic Cell with Physical Form,

    • 9.4 Neural Information Processing and Neural Prosthetics,

    • 9.5 Evolutionary Biology,

    • 9.6 Computational Ecology,

    • 9.7 Genome-enabled Individualized Medicine,

      • 9.7.1 Disease Susceptibility,

      • 9.7.2 Drug Response and Pharmacogenomics,

      • 9.7.3 Nutritional Genomics,



    • 9.8 A Digital Human on Which a Surgeon Can Operate Virtually,

    • 9.9 Computational Theories of Self-assembly and Self-modification,

    • 9.10 A Theory of Biological Information and Complexity,



  • 10 CULTURE AND RESEARCH INFRASTRUCTURE

    • 10.1 Setting the Context,

    • 10.2 Organizations and Institutions,

      • 10.2.1 The Nature of the Community,

      • 10.2.2 Education and Training,

        • 10.2.2.1 General Considerations,

        • 10.2.2.2 Undergraduate Programs,

        • 10.2.2.3 The BIO2010 Report,

          • 10.2.2.3.1 Engineering,

          • 10.2.2.3.2 Quantitative Training,

          • 10.2.2.3.3 Computer Science,



        • 10.2.2.4 Graduate Programs,

        • 10.2.2.5 Postdoctoral Programs,

          • Computational Molecular Biology, 10.2.2.5.1 The Sloan/DOE Postdoctoral Awards for

          • Scientific Interface, 10.2.2.5.2 The Burroughs-Wellcome Career Awards at the

          • Structural Biology: The Research Training Program, 10.2.2.5.3 Keck Center for Computational and



        • 10.2.2.6 Faculty Retraining in Midcareer,



      • 10.2.3 Academic Organizations,

      • 10.2.4 Industry,

        • 10.2.4.1 Major IT Corporations,

        • 10.2.4.2 Major Life Science Corporations,

        • 10.2.4.3 Start-up and Smaller Companies,



      • 10.2.5 Funding and Support,

        • 10.2.5.1 General Considerations,

          • 10.2.5.1.1 The Role of Funding Institutions,

          • 10.2.5.1.2 The Review Process,



        • 10.2.5.2 Federal Support, CONTENTS xix

          • 10.2.5.2.1 National Institutes of Health,

          • 10.2.5.2.2 National Science Foundation,

          • 10.2.5.2.3 Department of Energy,

          • 10.2.5.2.4 Defense Advanced Research Projects Agency,







    • 10.3 Barriers,

      • 10.3.1 Differences in Intellectual Style,

        • 10.3.1.1 Historical Origins and Intellectual Traditions,

        • 10.3.1.2 Different Approaches to Education and Training,

        • 10.3.1.3 The Role of Theory,

        • 10.3.1.4 Data and Experimentation,

        • 10.3.1.5 A Caricature of Intellectual Differences,



      • 10.3.2 Differences in Culture,

        • 10.3.2.1 The Nature of the Research Enterprise,

        • 10.3.2.2 Publication Venue,

        • 10.3.2.3 Organization of Human Resources,

        • 10.3.2.4 Devaluing the Contributions of the Other,

        • 10.3.2.5 Attitudinal Issues,



      • 10.3.3 Barriers in Academia,

        • 10.3.3.1 Academic Disciplines and Departmental Structure,

        • 10.3.3.2 Structure of Educational Programs,

        • 10.3.3.3 Coordination Costs,

        • 10.3.3.4 Risks of Retraining and Conversion,

        • 10.3.3.5 Rapid But Uneven Changes in Biology,

        • 10.3.3.6 Funding Risk,

        • 10.3.3.7 Local Cyberinfrastructure,



      • 10.3.4 Barriers in Commerce and Business,

        • 10.3.4.1 Importance Assigned to Short-term Payoffs,

        • 10.3.4.2 Reduced Workforces,

        • 10.3.4.3 Proprietary Systems,

        • 10.3.4.4 Cultural Differences Between Industry and Academia,



      • 10.3.5 Issues Related to Funding Policies and Review Mechanisms,

        • 10.3.5.1 Scope of Supported Work,

        • 10.3.5.2 Scale of Supported Work,

        • 10.3.5.3 The Review Process,



      • 10.3.6 Issues Related to Intellectual Property and Publication Credit,





  • 11 CONCLUSIONS AND RECOMMENDATIONS

    • 11.1 Disciplinary Perspectives,

      • 11.1.1 The Biology-Computing Interface,

      • 11.1.2 Other Emerging Fields at the BioComp Interface,



    • 11.2 Moving Forward,

      • 11.2.1 Building a New Community,

      • 11.2.2 Core Principles for Practitioners,

      • 11.2.3 Core Principles for Research Institutions,



    • 11.3 The Special Significance of Educational Innovation at the BioComp Interface,

      • 11.3.1 Content,

      • 11.3.2 Mechanisms,



    • 11.4 Recommendations for Research Funding Agencies, xx CONTENTS

      • 11.4.1 Core Principles for Funding Agencies,

      • 11.4.2 National Institutes of Health,

      • 11.4.3 National Science Foundation,

      • 11.4.4 Department of Energy,

      • 11.4.5 Defense Advanced Research Projects Agency,



    • 11.5 Conclusions Regarding Industry,

    • 11.6 Closing Thoughts,



  • A The Secrets of Life: A Mathematician’s Introduction to Molecular Biology APPENDIXES

  • B Challenge Problems in Bioinformatics and Computational Biology from Other Reports

  • C Biographies of Committee Members and Staff

  • D Workshop Participants

  • What Is CSTB?

Free download pdf