Computational Systems Biology Methods and Protocols.7z

CMI measurement between genes X and Y given the gene Z as a condition is defined as follows.

CMI XðÞ¼;YjZ

X

x∈X,y∈Y,z∈Z

pxðÞ;y;zlog

pxðÞ;yjz pxðÞjzpyðÞjz

ð 4 Þ

wherep(x,y,z) are joint probability distribution of gene triple (X, Y, Z) whilep(x|z),p(y|z), andp(x,y|z) are conditional probabilities of genes X and Y and gene pair (X,Y) given gene Z as a condition. According to Eq.4, the CMI measurement can inspect whether there is a direct correlation between genes X and Y and thus enhance the accuracy of relationship detection for gene pairs. How- ever, when the expression pattern of gene X or Y is strongly similar to gene Z, performance of the CMI measurement is decreased dramatically. So a new measurement of partial mutual information (PMI) is proposed to refine the CMI measurement [15], which is presented as follows.

PMI XðÞ¼;YjZ

X

x∈X,y∈Y,z∈Z

pxðÞ;y;zlog

pxðÞ;yjz p∗ðÞxjzp∗ðÞyjz p∗ðÞ¼xjz

X

y∈Y

pxðÞjz;ypyðÞ, p∗ðÞ¼yjz

X

x∈X

pyðÞjz;xpxðÞ

ð 5 Þ

where thep(x,y,z),p(x|z),p(y|z), andp(x,y|z) have the same defini- tion with the CMI measurement. Numerical studies of simulated and realistic data demonstrate that the PMI does have higher performance compared to the CMI measurement in relationship detection.

2.1.2 Probabilistic
Graphical Models

A gene regulatory network is presented as a graph modelG¼<V, E>, where V stands for genes and E denotes links between genes [2, 16]. Assuming n is observation times of experiment and m is the total number of genes, then the expression data (D) can be presented as an nm matrix (D¼(d 1 ,d 2 ...dm)). As for the problem of GRN reconstruction, it is equivalent to infer an optimize model (G) using the matrix data (D). In the following subsections, we will introduce the Bayesian network model and Gaussian graphical model for the network inference problem. Bayesian network model is a directed acyclic graph (DAG), where an edge from gene X to Y indicates a regulation from XtoY[17]. In other words, the gene X is a parent node, and the gene Y is a target node of gene X. For this model, the probability distribution of a network is generally factored in terms of the conditional distributions of each node variable given its parents.

PðÞD ¼∏mj¼ 1 pdjjPa dj

ð 6 Þ where djpresents expression profile of gene j and Pa(dj) are parent nodes of gene j. For the GRN inference problem, this is done by

140 Guangyong Zheng and Tao Huang

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources