Catalyzing Inquiry at the Interface of Computing and Biology

70 CATALYZING INQUIRY

Box 4.3 Biological Data and the Semantic Web

The Semantic Web seeks to create a universal medium for the exchange of machine-understandable data of all types, including biological data. Using Semantic Web technology, programs can share and process data even when they have been designed totally independently. The semantic web involves a Resource Description Framework (RDF), an RDF Schema language, and the Web Ontology language (OWL). RDF and OWL are Semantic Web standards that provide a framework for asset management, enterprise integration and the sharing and reuse of data on the Web. Furthermore, a standardized query language for RDF enables the “joining” of decentralized collections of RDF data. The underlying technology foundation of these languages is that of URLs, XML, and XML name spaces.

Within the life sciences, the notion of a life sciences identifier (LSID) is intended to provide a straight- forward approach to naming and identifying data resources stored in multiple, distributed data stores in a manner that overcomes the limitations of naming schemes in use today. LSIDs are persistent, location- independent, resource identifiers for uniquely naming biologically significant resources including but not limited to individual genes or proteins, or data objects that encode information about them.

The life sciences pose a particular challenge for data integration because the semantics of biological knowledge are constantly changing. For example, it may be known that two proteins bind to each other. But this fact could be represented at the cellular level, the tissue level, and the molecular level depend- ing on the context in which that fact was important.

The Semantic Web is intended to allow for evolutionary change in the relevant ontologies as new science emerges without the need for consensus. For example, if Researcher A states (and encodes using Semantic Web technology) a relationship between a protein and a signaling cascade with which Researcher B disagrees, Researcher B can instruct his or her computer to ignore (perhaps temporarily) the relationship encoded by Researcher A in favor (perhaps) of a relationship that is defined only locally.

An initiative coordinated by the World Wide Web Consortium seeks to explore how Semantic Web technologies can be used to reduce the barriers and costs associated with effective data integration, analysis, and collaboration in the life sciences research community, to enable disease understanding, and to accelerate the development of therapies. A meeting in October 2004 on the Semantic Web and the life sciences concluded that work was needed in two high-priority areas.

In the area of ontology development, collaborative efforts were felt required to define core vocabu-
laries that can bridge data and ontologies developed by individual communities of practice. These
vocabularies would address provenance and context (e.g., identifying data sources, authors, publica-
tions names, and collection conditions), terms for cross-references in publication and other reporting of
experimental results, navigation, versioning, and geospatial/temporal quantifiers.

With respect to LSIDs, the problem of sparse implementation was regarded as central, and partici-
pants believed that work should focus on how to implement LSIDs in a manner that leverages existing
Web resource resolution mechanisms such as http servers.

SOURCES: The Semantic Web Activity Statement, available at http://www.w3.org/2001/sw/Activity; Life Sciences Identifiers RFP Response, OMG Document lifesci/2003-12-02, January 12, 2004, available at http://www.omg.org/docs/lifesci/03-12- 02.doc#_Toc61702471; John Wilbanks, Science Commons, Massachusetts Institute of Technology, personal communication, April 4, 2005.

Catalyzing Inquiry at the Interface of Computing and Biology

70 CATALYZING INQUIRY

Get our desktop app

Company

Features

Documentation

Resources