Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

variable enables the compact factorization of the joint probability distribution
(JPD):


PXðÞ¼ 1 ,X 2 ,X 3 ,X 4 ,X 5 PXðÞ 1 PX 2

 (^) X 1 PX 3
 (^) X 1 PX 4
 (^) X 1 PX 5
 (^) X 3 ,X 4
In general case:
PXðÞ¼ 1 ;...;Xn Π
n
i¼ 1
PXðÞðijPa XðÞi 1 Þ
where,Pa(Xi) denotes the parents ofXi. The sparse form of BN provides a great
advantage because it drastically reduces the number of parameters necessary to
specify a unique probability distribution over a number of variables describing the
data. In this example, the possible number of parameters, which is 2^5  1 ¼31, is
reduced to 11 (Fig. 3 ).
To construct the BN from the data, one needs to learn its structure (structural
learning) and its conditional probability distributions (parameter learning).^106
The methods for structural learning are based onMonte Carlosampling. They start
with an initial configuration (often with no or full connectivity) and then gradually
improve estimates by iteratively adding, reversing, or deleting edges. The most
commonly used method of this type isK2 greedy search algorithm.^107 The second
step of BN construction consists of determining the parameters, i.e., conditional
probability distributions, or CPTs. For a continuous BN model, one may assume,
for example, that the observed data for a variable can be approximated by a normal
distribution based on some prior knowledge,K, and the dependency structure. This
assumed distribution is called aprior distribution,p(θ|K), for a modelθ. The aim of
the parameter learning is to determine this distribution by maximizing theposterior
distribution,p(θ|D,K), which describes how well the model fits the data,D. The
posterior distribution can be computed using Bayes’theorem (see Yu et al. ( 2011 )):
PðÞ¼θjD,K
PðÞθjKPDðÞjθ,K
pDðÞjK
ð 2 Þ
where,p(D|θ,K) is thelikelihoodofθ. The most commonly used algorithm to
maximize the prior distribution is the expectation-maximization (EM) algorithm.^108
In the case of a biological system, prior knowledge represents entity specific data
sources, such as PPI network, metabolic network, signaling pathways, literature
data, etc. The prior knowledge can be incorporated in both steps of the learning
process—in the structural learning, it can be used to constrain addition of edges; in
the parameter learning, it can be used to update the prior distribution. This approach
represents the basis of biological data integration using BN.
(^106) Yu et al. ( 2011 ).
(^107) Cooper and Herskovits ( 1992 ).
(^108) Dempster et al. ( 1977 ).
Computational Methods for Integration of Biological Data 159

Free download pdf