untitled

(ff) #1

144 6 Information Retrieval


In the case of graphs, the entries in a principal eigenvector measure the rel-
ative importance of the corresponding node with respect to the links. Each
of the two matrices has a different interpretation. The matrix ATAistheau-
thority matrix. The principal eigenvector ranks the documents according to
how much they are referred to by other documents. In this case the principal
eigenvector is (0, 1, 0.618). Document D is not referred to by any other doc-
ument in this set, so it is no surprise that it is not an authority. Document E
is referred to by two other documents, and document F is referred to by just
one other document. Thus E is more of an authority than F.
It is interesting to compare the Kleinberg algorithm with what one would
obtain using simple citation counts, as is often done in the research literature.
Since E has twice as many citation counts, one would expect that E would be
twice as authoritative as F. However, the principal eigenvector adjusts the
authoritativeness of each citation so that the authority weights are consis-
tent. In effect, the algorithm is implicitly assigning a level of “quality” to
the citations. In other words, being cited by a more authoritative document
counts more than being cited by a less authoritative source.
The matrix AATis thehub matrix.Ahuborcentral sourceis a document
that refers to a large number of other documents in the same set. In the re-
search literature, a survey article in a field would be a hub, and it might not
be an authority. This would be the case shortly after the survey article has
been published and before it has been cited by other articles. The princi-
pal eigenvector ranks the documents according to how much of a hub it is
for the particular query. In the example, the principal eigenvector is (1, 0,
0.618). Since document E does not refer to any other documents in this set, it
is not a hub. Document D is the main hub, since it refers to two other doc-
uments, while document F is much less of a hub since it only refers to one
other document.
One interesting and useful feature of the Kleinberg algorithm is that it is
possible for a document to have considerable importance either as an au-
thority or a hub even when the document does not match any of the terms
in the query. As a result, the Kleinberg algorithm improves both selectivity
and coverage of retrieval. It improves selectivity by improving the ordering
of documents so that the most relevant documents are more likely to occur at
the beginning of the list. It improves coverage by retrieving documents that
do not match the query but which are cited by documents that do match.
However, the Kleinberg algorithm does have its weaknesses. Because it
focuses on the principal eigenvector, the other eigenvectors are ignored even
when they may represent the actual focus of interest of the researcher. This
Free download pdf