the molecular mechanism of diseases [7, 8]. Identification of dys-
regulated genes, i.e., differentially expressed (DE) genes, could also
help identify potential drug targets. Determination of DE genes
would require special statistical analysis to avoid or correct biases
due to violation of statistical assumptions (e.g., sample sizes and
multiple testing). For example, moderatedt-statistic [9] can avoid
bias due to small sample sizes in determining the variance from
many selected genes.
Building a molecular interaction network from a list of DE
genes would be a daunting task due to the sheer volume of data.
A molecular interaction network is usually characterized by its
community structure [10]. The molecules (represented as nodes
in a network), e.g., genes or proteins, found in a module could be
enriched for their possible biological functions [11]. The molecules
involved in the mechanism of a disease are likely interacting with
one another in modules [12]. In such a module, the molecular
interactions are significantly correlated in suggesting strong asso-
ciations of the DE genes with the disease.
To illustrate how this method works, we take type 2 diabetes
mellitus (T2DM) for example. There are more than 380 million
people diagnosed with T2DM worldwide [13]. The prevalence is
increasing, and expected to affect more than 550 million people by
- More than 4 million T2DM patients died in 2011
[13]. T2DM patients suffer from insulin resistance andβ-cell dys-
function, resulting in hyperglycemia [14, 15]. T2DM causes severe
vascular complications, including atherosclerosis and diabetic
nephropathy [16]. It costs 5–10% of total expenditure in health
care in many countries [17]. Insulin is synthesized by pancreatic
β-cells and released in response to elevation of blood glucose level
[18]. There is a strong genetic predisposition associated with
T2DM, i.e., are expected to have T2DM and the risk would surge
to 40% or 70% depending on whether one parent or both parents
have diabetes [19]. Genetic studies [20–22] found many genes
related toβ-cell function, including TCF7L2 for increasing insulin
secretion, PPARG for insulin sensitivity. However, the genetics of
T2DM has not been fully elucidated. DE genes are thus crucial to
understanding the pathogenic mechanism and identifying potential
drug targets of T2DM. The following sections describe a method
to identify potential drug targets by building molecular interaction
networks from microarray data.
2 Materials
A microarray dataset, which comprises gene expression information
obtained from patients and healthy participants, is selected to iden-
tify potential drug targets. Microarray dataset GSE25724 is down-
loaded from Gene Expression Omnibus (GEO), contributed by
180 Sze Chung Yuen et al.