148 6 Information Retrieval
such topics can be handled by an information retrieval system. Unlike ordi-
nary terms, concept combinations are necessarily dependent on the concepts
that were combined (both semantically and statistically). As we noted in
section 6.2, terms are considered to be statistically independent by default.
No such assumption will be valid for concept combinations. Fortunately, the
vector space model can incorporate such dependencies.
To see how this is done, consider the term “flu vaccine.” In the usual vector
space model, this is just two independent terms, “flu” and “vaccine.” These
two terms represent two of the factors in the ratio
Pr(D|Relevant)/P r(D|Irrelevant)
from section 6.2 which determines the degree of relevance of a document D
to a query. If we presume that “flu vaccine” is a concept combination which
has been indexed, then the two factors for “flu” and “vaccine” should be
replaced by the single factor for “flu vaccine.” In other words, the concept
combination is a new term which supersedes the terms that were combined.
However, this is done only when both the document and the query use the
concept combination. If only one of them has the combination, then the in-
dividual terms must still be used to measure relevance.
Although the vector space model can be adapted to deal with concept
combinations, it still suffers from the deficiencies already enumerated in sec-
tion 6.2. Techniques that deal more directly with the meaning of the docu-
ments and queries are considered in section 6.6 and in chapter 8.
Summary
- Concepts can be combined in many ways which are much deeper than
just the juxtaposition of the words used. - The vector space model can be extended to deal with concept combina-
tions, but it is still subject to deficiencies because it does not deal with the
meaning of words.
6.6 Retrieval of Knowledge Representations
Information retrieval systems, including Google, generally do not make use
of the meaning of the information in the document. As a result, searches
will necessarily be hit-or-miss activities. Sometimes one will get lucky, other