Personalized_Medicine_A_New_Medical_and_Social_Challenge

(Barré) #1

complexity. In contrast, kernel matrices are large and their storage can create
serious demands on memory resources.
Nonnegative matrix factorization (NMF)is a technique widely used in
machine learning for dimensionality reduction and clustering problems. It estimates
a high-dimensional matrix,R 2 ℜn^1 xn^2 as a product of two low-rank, positive matrix
factors,S 2 ℜn^1 xkandG 2 ℜn^2 xk:


RSGT ð 8 Þ

where the choice ofkmin{n 1 ,n 2 } provides dimensionality reduction. NMF was
initially introduced as an improvement for singular value decomposition (SVD) and
principal component analysis (PCA) due to positive matrix factors, which provide
easier interpretation of low-dimensional data representation.^121 It has been shown
that NMF is equivalent tok-means clusteringproblem where entries inSmatrix
represent cluster centroids, while entries inGmatrix represent cluster membership
indicators, and parameterkrepresents the number of clusters.^122 This formalism is
further extended by Wanget al.( 2008 ) to co-cluster heterogeneous data by defining
nonnegative matrix tri-factorization. For example, the simplest co-clustering prob-
lem involves only two types of objects (e.g., genes and GO terms) in which matrix
R 122 ℜn^1 xn^2 of relationships between these two data types (e.g., gene annotations) is
decomposed into three low-rank matrix factors (see Fig. 4 for an illustration):


R 12 G 1 S 12 G 2 T ð 9 Þ

where nonnegative matricesG 12 ℜþn^1 xk^1 andG 22 ℜþn^2 xk^2 correspond to the cluster
indicator matrix of the first (e.g., genes) and the second (e.g., GO terms) data set and
S 122 ℜk^1 xk^2 is a low-dimensional compression ofR 12. To obtain the cluster
indicator matrices, from which we can make inference, we need to solve the
optimization problem that minimizes the distance,J, betweenR 12 andG 1 S 12 GT 2 :


min
G 1  0 ,G 2  0

J¼ R 12 G 1 S 12 G 2 T



2


F ð^10 Þ

This optimization problem allows incorporation of additional, prior information
on data items, such as biological network or any other similarity network informa-
tion. This information can be incorporated in the form ofintra-typeconstraints,
violation of which causes penalties on the objective function,J, and therefore they
guide the clustering process. Such prior information is often represented via
constraint matricesthat relate objects in the first,Θ 12 ℜn^1 xn^1 , and in the second,
Θ 22 ℜn^2 xn^2 data set. These matrices can be incorporated into the previous objective
function in the following way:


(^121) Lee and Seung ( 1999 ).
(^122) Ding et al. ( 2006 ).
166 V. Gligorijevic ́and N. Pržulj

Free download pdf