6.4 Organizing by Citation 143
by the initial candidates. Many other refinements can also be employed to
improve the set of candidates.
The next step is unique to the Kleinberg algorithm. A matrix is computed
in which the entries represent references of one document to another. For
example, suppose that one has three documents called D, E, and F, such that
D refers to E and F, while F refers to E. None of the documents refer to D and
document E does not refer to any other document in this set. The matrix is
given as follows:
D E F
D 0 1 1
E 0 0 0
F 0 1 0
For example, the 1 in the first row and second column of the matrix indi-
cates that document D refers to document E. Note that the rows and columns
have been labeled for ease in understanding the meaning of the entries. This
matrix is called theadjacency matrixof the graph. It is usually designated by
the letter A. We now compute the matrix products ATA and AAT, where the
superscriptTmeans that the matrix has been transposed. The following are
these two matrices:
D E F
D 0 0 0
E 0 2 1
F 0 1 1
D E F
D 2 0 1
E 0 0 0
F 1 0 1
The original matrix A will not be symmetric in general, but both of the
products will be. In fact, both matrices are positive semidefinite. In other
words, the eigenvalues will be nonnegative. The largest eigenvalue is called
the principal eigenvalue, and its eigenvectors are called the principal eigen-
vectors. While it is difficult in general to compute eigenvalues and eigen-
vectors of large matrices, it is relatively easy to find a principal eigenvector.
The space of principal eigenvectors is called the principal component. Prin-
cipal components analysis (PCA) is a commonly used statistical technique
for accounting for the variance in data.