Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
70 CATALYZING INQUIRY

Box 4.3
Biological Data and the Semantic Web

The Semantic Web seeks to create a universal medium for the exchange of machine-understandable
data of all types, including biological data. Using Semantic Web technology, programs can share and
process data even when they have been designed totally independently. The semantic web involves a
Resource Description Framework (RDF), an RDF Schema language, and the Web Ontology language
(OWL). RDF and OWL are Semantic Web standards that provide a framework for asset management,
enterprise integration and the sharing and reuse of data on the Web. Furthermore, a standardized query
language for RDF enables the “joining” of decentralized collections of RDF data. The underlying tech-
nology foundation of these languages is that of URLs, XML, and XML name spaces.

Within the life sciences, the notion of a life sciences identifier (LSID) is intended to provide a straight-
forward approach to naming and identifying data resources stored in multiple, distributed data stores in
a manner that overcomes the limitations of naming schemes in use today. LSIDs are persistent, location-
independent, resource identifiers for uniquely naming biologically significant resources including but
not limited to individual genes or proteins, or data objects that encode information about them.

The life sciences pose a particular challenge for data integration because the semantics of biological
knowledge are constantly changing. For example, it may be known that two proteins bind to each other.
But this fact could be represented at the cellular level, the tissue level, and the molecular level depend-
ing on the context in which that fact was important.

The Semantic Web is intended to allow for evolutionary change in the relevant ontologies as new
science emerges without the need for consensus. For example, if Researcher A states (and encodes
using Semantic Web technology) a relationship between a protein and a signaling cascade with
which Researcher B disagrees, Researcher B can instruct his or her computer to ignore (perhaps
temporarily) the relationship encoded by Researcher A in favor (perhaps) of a relationship that is
defined only locally.

An initiative coordinated by the World Wide Web Consortium seeks to explore how Semantic Web
technologies can be used to reduce the barriers and costs associated with effective data integration,
analysis, and collaboration in the life sciences research community, to enable disease understanding,
and to accelerate the development of therapies. A meeting in October 2004 on the Semantic Web and
the life sciences concluded that work was needed in two high-priority areas.


  • In the area of ontology development, collaborative efforts were felt required to define core vocabu-
    laries that can bridge data and ontologies developed by individual communities of practice. These
    vocabularies would address provenance and context (e.g., identifying data sources, authors, publica-
    tions names, and collection conditions), terms for cross-references in publication and other reporting of
    experimental results, navigation, versioning, and geospatial/temporal quantifiers.

  • With respect to LSIDs, the problem of sparse implementation was regarded as central, and partici-
    pants believed that work should focus on how to implement LSIDs in a manner that leverages existing
    Web resource resolution mechanisms such as http servers.


SOURCES: The Semantic Web Activity Statement, available at http://www.w3.org/2001/sw/Activity; Life Sciences Identifiers RFP
Response, OMG Document lifesci/2003-12-02, January 12, 2004, available at http://www.omg.org/docs/lifesci/03-12-
02.doc#_Toc61702471; John Wilbanks, Science Commons, Massachusetts Institute of Technology, personal communication, April
4, 2005.
Free download pdf