Computational Drug Discovery and Design

(backadmin) #1

3 Methods


3.1 Quality Control All downloaded .CEL files are saved in a single file (e.g., the file can
be named by the GEO number GSE25724). The raw data of .CEL
files are read byaffypackage of R software (Fig.2).
The intensities of all arrays provide some information about the
quality of the arrays. The packageRColorBreweris used to provide
color-coded quality control plots. The arrays in the same color
indicates that they are from the same group, i.e., healthy or
T2DM. The label are put on the x-axis at the size of 50% smaller,
by settinglas¼ 3 ,cex.axis¼0.5, respectively. The box plots
show the relative medians among all arrays (Fig.3).
The quality control of these files is conducted withsimpleaffy
package of R software. The parameter "usemid¼T" suggests that 3^0
to mid ratios forβ-actin and GAPDH are used, instead of 3^0 –5^0
ratios forβ-actin and GAPDH (seeNote 1).
The median of GSM631755.CEL is higher, compared with
the others. The quality control plot provides the information of sev-
eral quality control metrics, including average background, number
of genes called present, scale factor, 3^0 to mid ratios forβ-actin and
GAPDH (Fig.4). The direction of scaling factor for GSM631755.
CEL is significantly different from the others, and it is close to the
boundary. The average background for GSM631755.CEL is also
higher than the others, suggesting that a greater signal is detected
from the array. The GSM631755.CEL is then removed from the
data set.
The arrays are saved into a new file (i.e., the file can be named
cleanerrawdata). Again, we need to set the working directory, read
the arrays, assess the intensities, and conduct the quality control
(Fig.5a).
The medians of intensities are similar across all 12 arrays
(Fig.5b). The average background, number of genes called pres-
ent, and scale factor are similar across 12 arrays (Fig.5c). The
hybridization control gene is called present in each array, indicating
good quality of hybridization. The high ratios of 3^0 to mid for
β-actin and GAPDH from diabetic samples indicate unsatisfactory


Fig. 2Access to .CEL files. (a) R code for reading .CEL files. (b) A list of .CEL files available in the folder


182 Sze Chung Yuen et al.

Free download pdf