Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

When the construction of BN is finished and the JPD is known, one can make
inference from the data, i.e., predict unobserved events based on the observed
evidence. Specifically, having the JPD in the factorized form, we can answer to
the question: what is the probability ofXigiven that we knowXj? For example, in
the case of BN represented in Fig. 3 , we may ask, what is the probability of abad
clinical outcome, given that gene 1 is activated (on)? Following the chain rule, we


can write the following:PX 5 ¼bad


(^) X 1 ¼on





¼PXðÞ^1 PX¼ðÞon 1 ,¼Xon^5 ¼bad. The numerator

can be calculated by applying the rule of marginalization,^109 i.e., by summing the
JPD over free variables:


PXðÞ¼ 1 ¼on,X 5 ¼bad

X


X 1 ,X 2 ,X 3 ¼fgon;off

PXðÞ 1 ¼on,X 2 ,X 3 ,X 4 ,X 5 ¼bad

We leave the reader to finish the computation of this summation by inserting the
numerical values of conditional probabilities from Fig. 3 into this equation.
Evaluation measuresare used to assess the performance of the model and to
infer the correct outcome. Typically, the whole data set is divided in two subsets:
thetrainingset and thetestset. The samples from the training set are used to build
the model. Next, the model is used to make new predictions, and the predicted
outcomes are compared with the known information in the test set. From this we
can construct different metrics to assess the performance of our model. The most
widely used one is the receiver operating characteristic curve (ROC) and the area
under the ROC curve.^110
Many previous studies used Bayesian networks to integrate heterogeneous
biological data and to infer new associations between biological entities. For
example, Troyanskayaet al.( 2003 ) developed a framework MAGIC (Multisource
Association of Genes by Integration of Clusters), which uses Bayesian reasoning to
integrate PPI and genetic interactions, along with expression and transcription data.
They applied their framework on protein function prediction inSaccharomyces
cerevisiae. Using Gene Ontology as a benchmark, they demonstrated that data
integration improved the accuracy of the gene groupings compared with the same
analysis done on expression data alone. Another study constructed a functional
linkage network (FLN) from PPI and gene expression data.^111 FLN was further
used, along with protein motif information, protein localization data, and mutant
phenotype data, incorporated into a Bayesian framework, to predict new gene
functions. This framework allowed the efficient computation of the posterior
probability of each gene having a particular function, given different types of
biological data as prior information.
Many other studies constructed FLN from various biological data. For instance,
Leeet al.( 2004 ) used a Bayesian statistics approach to construct a single, coherent


(^109) Ben-Gal ( 2007 ).
(^110) Fawcett ( 2006 ).
(^111) Nariai et al. ( 2007 ).
160 V. Gligorijevic ́and N. Pržulj

Free download pdf