untitled

(ff) #1

150 6 Information Retrieval


of every natural language statement. However, it is possible to extract know-
ledge representations that are good enough to be used for information re-
trieval. Some commercial systems are available already that extract know-
ledge representations from biomedical text and that index these knowledge
representations for rapid retrieval.
Once the knowledge representations have been extracted, there are two
approaches to querying the documents:


  1. Use a precise and unambiguous query language. For relational databases
    one can use the SQL language, and such languages are now being devel-
    oped for XML.

  2. Query the documents with natural language. This means that NLP tech-
    niques must be used to extract the knowledge representation of the query.
    The query knowledge representation can then be matched against the
    knowledge representations of the documents.


Although both approaches support inferencing, they differ in many ways.
In the first approach the query uses a specialized query language that has
little resemblance to natural language. There is now a standard query lan-
guage for XML documents, and chapter 8 discusses it in some detail. The
advantage is that one can be confident that the query will return exactly
every item that is relevant to the query. However, if the knowledge repre-
sentations of the documents are inaccurate, then the query results will also
be inaccurate. Since NLP techniques are still not perfect, one cannot expect
that query results will also be perfect. The disadvantage of specialized query
languages is that one must learn how to program in the language, and this
can require a significant amount of effort.
The second approach allows a person to query the document corpus by
using natural language queries. The queries are expressed as a knowledge
representation in the same way that documents are expressed. The retrieval
system answers the query by looking for matching knowledge representa-
tions in the corpus. The match can be complete or partial. This approach
is less brittle than the specialized query language approach, and so is better
suited to knowledge representations that are somewhat inaccurate. How-
ever, the fact that it is less precise can be a disadvantage when the knowledge
representations are known to be good and one would like to extract precise
information from the corpus.
One example of the second approach is the Semantic Knowledge Indexing
Platform (SKIP). A demonstration is available online atwww.semantxls.
Free download pdf