Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

Specifically, in the study by Schadtet al.( 2012 ), the goal was to determine
whether a particular individual with a known SNP genotype belongs to the study
cohort (C): this was to be determined by using only expression data of genes
associated with the eQTL of that individual. To solve this problem, gene expression
and genotype data of an independent training cohort (T) were used to train the
Bayesian classifier. Specifically,ciseQTLs^114 of individualiwere used to construct
his/her expression trait vectoreiand SNP genotype vectorgias follows: expression
value of genej and genotype of SNPj(together forming a cis eQTLj) were
represented asj-coordinates of vectorseiandgi,respectively. ForNindividuals in
training cohort, T, the following vector sets were created: {e 1 ,...,eN} and {g 1 ,...,
gN}. The normal density function,Φ(ei|gi,), was used to approximate the given
expression trait vectoreifor individualiin the training cohort (T) having a given
genotypegi. Using only coexpression data of the individuals in the study cohort (C),
the aim was to predict their genotypes, that is, to find the probability for a given
genotype,gi, for each individualiin C, given his/her expression trait vectorei:
Pc(gi|ei).Using the Bayes theorem (see Eq. 2 ), this (posterior) probability can be
computed as followsPc(gi|ei)/P(gi)Φ(ei|gi,), whereP(gi) represents the proba-
bility of obtaining genotypegi. Using the values of the posterior probability, the
authors were able to correctly match individuals with their corresponding geno-
types. Specifically, they correctly identified 99 % of individuals in the study cohort
(C) by using only expression data from the same tissue of the individuals, while
they correctly identified 90 % of individuals in the study by using expression data
from different tissues.
The impact of this study on the development of personalized medicine is large
since it suggests that having the RNA data of patients could be used to easily derive
their genotypic information with high certainty.


4.2 Kernel-Based Data Integration


Kernel-based (KB) data integration is a statistical framework for integration of
heterogeneous data views of the same set of biological entities (genes, proteins,
etc.). It is based on kernel-based statistical learning methods that are originally
designed for solving classification and clustering problems in computer science.^115
The great advantage of this method compared to BN data integration approach lies
in the fact that KB data integration is able to handle all kinds of data (e.g., symbolic
data, strings, etc.) by easily embedding it into feature spaces. This significantly
increases the range of possible data types that can be integrated.


(^114) A cis eQTL is an eQTL that is located near the expressed gene.
(^115) Yu et al. ( 2011 ).
162 V. Gligorijevic ́and N. Pržulj

Free download pdf