Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
CMI measurement between genes X and Y given the gene Z as a
condition is defined as follows.

CMI XðÞ¼;YjZ

X

x∈X,y∈Y,z∈Z

pxðÞ;y;zlog

pxðÞ;yjz
pxðÞjzpyðÞjz

ð 4 Þ

wherep(x,y,z) are joint probability distribution of gene triple (X, Y, Z)
whilep(x|z),p(y|z), andp(x,y|z) are conditional probabilities of
genes X and Y and gene pair (X,Y) given gene Z as a condition.
According to Eq.4, the CMI measurement can inspect whether
there is a direct correlation between genes X and Y and thus
enhance the accuracy of relationship detection for gene pairs. How-
ever, when the expression pattern of gene X or Y is strongly similar
to gene Z, performance of the CMI measurement is decreased
dramatically. So a new measurement of partial mutual information
(PMI) is proposed to refine the CMI measurement [15], which is
presented as follows.

PMI XðÞ¼;YjZ

X

x∈X,y∈Y,z∈Z

pxðÞ;y;zlog

pxðÞ;yjz
p∗ðÞxjzp∗ðÞyjz
p∗ðÞ¼xjz

X

y∈Y

pxðÞjz;ypyðÞ, p∗ðÞ¼yjz

X

x∈X

pyðÞjz;xpxðÞ

ð 5 Þ

where thep(x,y,z),p(x|z),p(y|z), andp(x,y|z) have the same defini-
tion with the CMI measurement. Numerical studies of simulated
and realistic data demonstrate that the PMI does have higher
performance compared to the CMI measurement in relationship
detection.

2.1.2 Probabilistic
Graphical Models


A gene regulatory network is presented as a graph modelG¼<V,
E>, where V stands for genes and E denotes links between genes
[2, 16]. Assuming n is observation times of experiment and m is the
total number of genes, then the expression data (D) can be pre-
sented as an nm matrix (D¼(d 1 ,d 2 ...dm)). As for the problem
of GRN reconstruction, it is equivalent to infer an optimize model
(G) using the matrix data (D). In the following subsections, we will
introduce the Bayesian network model and Gaussian graphical
model for the network inference problem.
Bayesian network model is a directed acyclic graph (DAG),
where an edge from gene X to Y indicates a regulation from
XtoY[17]. In other words, the gene X is a parent node, and the
gene Y is a target node of gene X. For this model, the probability
distribution of a network is generally factored in terms of the
conditional distributions of each node variable given its parents.

PðÞD ¼∏mj¼ 1 pdjjPa dj


ð 6 Þ
where djpresents expression profile of gene j and Pa(dj) are parent
nodes of gene j. For the GRN inference problem, this is done by

140 Guangyong Zheng and Tao Huang

Free download pdf