untitled

(ff) #1

130 6 Information Retrieval


or words, individually or severally. The query is then given to a search in-
termediary. In the past the intermediary was a person who specialized in
searching. It is more common today for the intermediary to be a computer
system. Such systems are calledinformation retrieval systemsoronline search
engines. The search intermediary tries to match the topics, categories, and
words from the query with information objects in the corpus. The interme-
diary responds with a set of information objects that, it is hoped, satisfy the
searcher’s needs.
Queries are certainly not the only way to find information in a corpus.
Another very commonly used technique is to follow citations orreferences
within the documents in the corpus. This technique is calledbrowsing. Online
browsing tools are now ubiquitous. Such a tool allows a searcher to follow
references contained in information objects by simply clicking on a word or
picture within the information object. In the standard model for information
retrieval, a sharp distinction is made between searching using queries and
searching using references.
In the standard model, the quality of a search is measured using two num-
bers (Saracevic 1975). The first number represents how thorough the search
was. It is the fraction of the total number of relevant information objects that
are presented to the searcher. This fraction is called by various names, such as
thesensitivity,coverage,orrecall. If the coverage is less than 100%, then some
relevant information objects have been missed. The second number repre-
sents how careful the search was. It is the fraction of the objects presented to
the searcher that are judged to be relevant. This number is called theprecision
orselectivity. If the precision is less than 100%, then some irrelevant objects
were presented to the searcher.
Of course, one can always increase the coverage by adding many more
information objects to those already presented, thereby ruining the selectiv-
ity. Clearly, one would like to balance the coverage and selectivity so as to
achieve a search that is as careful and thorough as possible. In this chapter,
a variety of search techniques and services are introduced, and the role that
ontologies can play is described.
The queries considered above are expressed using topics, categories, or
words, or a combination of these. The assumption is that the query is an
imprecise and incomplete specification, and the search engine will make an
effort to retrieve documents that are likely, but not guaranteed, to be relevant.
Alternatively, one could use precise queries. Such queries must be expressed
in a formal language with precise semantics. The best-known example of
such a query language is SQL, which is used for retrieving information from
Free download pdf