untitled

6.2 Vector Space Retrieval 139

It is difficult to map the inflected forms of an English word to a single concept because inflection is highly irregular and ambiguous.

The vector space model treats the document as just a collection of un-
connected and unrelated terms. There is no meaning beyond the terms
themselves.

It presumes that the terms are statistically independent, both in the collec-
tion as a whole and in each document. The vector space model in general
allows for terms that are correlated, but it is computationally difficult even
to find correlations between pairs of terms, let alone sets of three or more
terms, so very few retrieval engines attempt to find or to make use of such
correlations.

By focusing exclusively on terms, it cannot take advantage of document
structure. webpages and XML documents have a hierarchical structure
whose elements are tagged. XML document elements are especially mean-
ingful, but none of this meaning is expressible in the vector space model.

By treating documents as independent entities, the vector space model
cannot take advantage of interdocument links such as the citations that
occur in scientific research papers and the hypertext links that occur in
webpages.

Some systems attempt to alleviate these problems by adding dependencies
between terms such as how close the terms are to each other in the document.
However, these improvements do not address the fundamental weaknesses
of this approach.
Ontologies can be useful tools for dealing with these deficiencies, and
some of the techniques are introduced in the next section.

Summary

Words have different degrees of selectivity.

In the vector space model each document and query is represented by a
vector where each component of the vector is the term weight for a word
that can occur in a document.

The most common term weight is the TFIDF weight which is the product
of the number of times that the word occurs in the document times the
logarithm of the inverse of the number of documents that have the word.

untitled

Summary

Get our desktop app

Company

Features

Documentation

Resources