untitled

(ff) #1

92 5 Survey of Ontologies in Bioinformatics


tions. MetaMap can also be used for constructing a list of ranking concepts
by applying the MetaMap indexing ranking function to each UMLS META
concept. The UMLS Knowledge Source Server (UMLSKS)umlsks.nlm.
nih.govis a web server that provides access to the knowledge sources and
other related resources made available by developers using the UMLS.
The UMLS is a rich source of knowledge in the biomedical domain. The
UMLS is used for research and development in a range of different applica-
tions, including natural language processing (Baclawski et al. 2000; McCray
et al. 2001). UMLS browsers are discussed in section 6.3. Search engines
based on the UMLS use it either as a source of keywords or as a means of gen-
erating knowledge representations. An example of the keyword approach is
the Medical World Search atwww.mwsearch.comwhich is a search engine
for medical information in selected medical sites. An example of the know-
ledge representation approach is the Semantic Knowledge Indexing Platform
(SKIP), shown in section 6.6.

5.1.2 The Gene Ontology


The most prominent ontology for bioinformatics is GO. GO is produced by
the GO Consortium, which seeks to provide a structured, controlled vocabu-
lary for the description of gene product function, process, and location (GO
2003, 2004). The GO Consortium was initially a collaboration among the
Mouse Genome Database, FlyBase, andSaccharomycesGenome database ef-
forts. It has since grown to 16 members. GO is now part of the UMLS, and the
GO Consortium is a member of the Open Biological Ontologies consortium
to be discussed in the next section.
A description of a gene product using the GO terminology is called an
annotation. One important use of GO is the prediction of gene function based
on patterns of annotation. For example, if annotations for two attributes tend
to occur together in a database, then a gene holding one attribute is likely to
hold for the other as well (King et al. 2003). In this way, functional predictions
can be made by applying prior knowledge to infer the function of a novel
entity (either a gene or a protein).
GO consists of three distinct ontologies, each of which serves as anorga-
nizing principlefor describing gene products. The intention is that each gene
product should be annotated by classifying it three times, once within each
ontology (Fraser and Marcotte 2004). The three GO ontologies are:
Free download pdf