4 Identification of Disease Gene
One of the key applications of network analysis is to identify novel
disease genes based on known disease genes on the network. There
are many different methods that have been proposed. Many of
them were based on the principle called guilt by association
[43]. The basic assumption is that the genes have similar functions
with their neighbors on the network. It is reasonable in most
scenarios. Based on this idea, the interaction neighbors of known
disease genes are very likely to be also disease genes. Actually, the
regulatory modules on network confirm the guilt-by-association
principle [44]. In practice, guilt-by-association-based neighbor
counting [45] is widely used. But the disadvantage of guilt-by-
association methods is very obvious when the number of reported
disease genes is too small and they locate far from each other on
network. At that time, the guilt-by-association methods will not be
able to identify possible novel disease genes. Therefore, two new
methods are introduced in the following sections.
4.1 Random Walk
with Restart (RWR)
Random walk with restart (RWR) [46–51] algorithm simulates a
walker who starts from the nodes of reported disease genes and
moves to its randomly chosen neighbors on the network at each
step [48]. After many steps of walks, the procedures will be steady.
Based on the final probability of the walker’s walks to each node on
the network, the highly possible candidate disease genes are
identified.
It works as follows:
For a gene regulatory network G¼(V, E) comprised of a set of
genes V and a set of interactions E, we represent it by annn
adjacency matrix A, wherenis the number of genes. The entry at
rowiand columnjis set to 1 if gene i interacts with gene j ;
otherwise it is set to 0.
First, adjacency matrixAis column-wise normalized as follows
[52–54]:
A½i;j^0 ¼
A½i;j
Pn
k¼ 1 A½k;j
ð 11 Þ
Then, in each step, the state probabilitiesPtþ 1 at timetþ1 are
calculated as
Ptþ 1 ¼ðÞ 1 r A^0 PtþrP 0 ð 12 Þ
wherePtis state probabilities at timet,ris the restart probability,A
0
is the normalized adjacency matrix, andP 0 is the initial state prob-
abilities which is a column vector with 1/mfor the mknown
disease genes and to 0 for other genes on the gene regulatory
network.
146 Guangyong Zheng and Tao Huang