untitled

144 6 Information Retrieval

In the case of graphs, the entries in a principal eigenvector measure the rel- ative importance of the corresponding node with respect to the links. Each of the two matrices has a different interpretation. The matrix ATAistheau- thority matrix. The principal eigenvector ranks the documents according to how much they are referred to by other documents. In this case the principal eigenvector is (0, 1, 0.618). Document D is not referred to by any other document in this set, so it is no surprise that it is not an authority. Document E is referred to by two other documents, and document F is referred to by just one other document. Thus E is more of an authority than F. It is interesting to compare the Kleinberg algorithm with what one would obtain using simple citation counts, as is often done in the research literature. Since E has twice as many citation counts, one would expect that E would be twice as authoritative as F. However, the principal eigenvector adjusts the authoritativeness of each citation so that the authority weights are consis- tent. In effect, the algorithm is implicitly assigning a level of “quality” to the citations. In other words, being cited by a more authoritative document counts more than being cited by a less authoritative source. The matrix AATis thehub matrix.Ahuborcentral sourceis a document that refers to a large number of other documents in the same set. In the research literature, a survey article in a field would be a hub, and it might not be an authority. This would be the case shortly after the survey article has been published and before it has been cited by other articles. The principal eigenvector ranks the documents according to how much of a hub it is for the particular query. In the example, the principal eigenvector is (1, 0, 0.618). Since document E does not refer to any other documents in this set, it is not a hub. Document D is the main hub, since it refers to two other documents, while document F is much less of a hub since it only refers to one other document. One interesting and useful feature of the Kleinberg algorithm is that it is possible for a document to have considerable importance either as an authority or a hub even when the document does not match any of the terms in the query. As a result, the Kleinberg algorithm improves both selectivity and coverage of retrieval. It improves selectivity by improving the ordering of documents so that the most relevant documents are more likely to occur at the beginning of the list. It improves coverage by retrieving documents that do not match the query but which are cited by documents that do match. However, the Kleinberg algorithm does have its weaknesses. Because it focuses on the principal eigenvector, the other eigenvectors are ignored even when they may represent the actual focus of interest of the researcher. This

untitled

Get our desktop app

Company

Features

Documentation

Resources