untitled

(ff) #1

140 6 Information Retrieval



  • A query is evaluated by computing the inner product of the query vec-
    tor and each document vector and sorting. The documents are arranged
    (ranked) by the inner products.

  • The vector space model is a geometric interpretation of the corpus which
    can be used to classify documents by looking for clusters of documents
    that are near one another.

  • In spite of its elegance and geometric appeal, the vector space model de-
    pends on many assumptions and has a number of limitations.


6.3 Using Ontologies for Formulating Queries


Ontologies can address the shortcomings of traditional information retrieval
in many ways. In this section, we look at how an ontology can be a context
and a source of terminology which can be used to help formulate queries
which are then given to an ordinary vector space retrieval engine.
The simplest way to browse an ontology is to use its hierarchical struc-
ture as a means of organizing the concepts. One first presents the top-level
concepts, then the next level, and so on. This is the same approach used to
organize directory structures (file system browsing) and XML documents.
(See section 1.4, especially table 1.1.) Ontologies that can be browsed in this
way are relatively small and simple, consisting of just a taxonomy.
However, biomedical ontologies can be very large repositories of termi-
nology which require their own information retrieval systems. Consider, for
example, the Unified Medical Language System (UMLS), which was intro-
duced in subsection 5.1.1. With 4.5 million terms, the UMLS is much too
large to be browsed in any casual manner, and a variety of tools have been
introduced to assist one in this task. Some tools are designed for general
ontologies, while others are specialized for specific biological or medical on-
tologies.
The Medical Subject Headings (MeSH) browser is possibly the best known
biomedical terminology browser. It is available atwww.nlm.nih.gov/mesh.
The MeSH browser is a specialized browser for MeSH. MeSH is the con-
trolled vocabulary thesaurus of the National Library of Medicine (NLM).
MeSH consists of sets of terms naming descriptors in a hierarchical structure
that permits searching at various levels of specificity.
The MeSH hierarchy has 11 levels. There are currently 22,568 descrip-
tors in MeSH and more than 139,000 headings called supplementary concept
Free download pdf