142 6 Information Retrieval
- One can use more specific concepts, when more general concepts find too
much information. This is known as “narrowing” the query. - One can use concepts that are related in ways that are nonhierarchical.
For example, a nucleolus is a part of the nucleus of a cell. This is aquery
modificationwhich neither broadens nor narrows the query.
Summary
- Ontologies are an important source of terminology that can be used to
formulate queries. - Biological and medical ontologies can be so large and complex that spe-
cialized browsing and retrieval tools are necessary. - Several browsers are now available for the UMLS.
- One can use ontologies as a means of query modification when a query
does not return satisfactory results.
6.4 Organizing by Citation
The popularity of the World Wide Web has led to many new search tech-
niques that attempt to utilize its structure. One such technique was devel-
oped by Kleinberg in (Kleinberg 1998; Chakrabarti et al. 1998; Gibson et al.
1998). A variation on this technique has since been implemented with con-
siderable success by Google (Page and Brin 2004).
The underlying structure that is utilized by the Kleinberg algorithm is the
graph structure of documents in which one document refers to other docu-
ments. In the terminology of directed graphs, the documents are nodes and
each reference from one document to another is represented by a directed
edge from the node of the referring document to the node of the document
being referenced.
Like any search algorithm, the Kleinberg algorithm begins with a query.
This query is processed using a form of vector space retrieval to obtain a col-
lection of candidate documents. Unlike search techniques based solely on
the vector space model, there is no need to be very precise or careful about
how term weighting is performed because the final ranking of the retrieved
documents uses a very different technique from the one used by vector space
retrieval engines. After obtaining the initial set of candidates, the set is ex-
panded somewhat by including documents which refer to or are referenced