Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
Feature reduction is performed often using two criteria. The
first one is to remove genes with non-zero read counts in less than
three cells. The second one is to use highly variable genes. A
commonly used R package Seurat works by calculating the average
expression with dispersion to select variable genes [16]. In our
previous studies, we introduced another method which calculated
Pearson correlation coefficients (PCCs) between normalized
expression values and library sizes and used genes with PCCs less
than 0.6 for cluster analysis. In addition, we found that feature
reduction had little effect on the results of cluster analysis, particu-
larly for data containing samples less than 1000. However, using
variable genes improves the cluster analysis.
To overcome the extensive technical noise, principal compo-
nent (PC) reduction can be used to further remove noise in the
scRNA-seq data. Principal component analysis (PCA) is a com-
monly used dimension reduction method to produce PCs from
the gene expression matrix [17]. The R package Seurat clusters
cells based on their PCA scores, with each PC essentially represent-
ing a “metagene” that combines information across a correlated
gene set [16]. Determining how many PCs to include
(PC reduction) is therefore an important step. Based on our previ-
ous studies, PC reduction made clusters more obviously separated.
Currently, PC reduction has been integrated into many software or
R packages (e.g., Rt-SNE).

Fig. 4Fundamental problems in the scRNA-seq data analysis. Normalization,
sample reduction, feature reduction, and cluster analysis are fundamental
problems in the scRNA-seq data analysis. DE analysis represents differential
expression analysis, which is the most typical one of the downstream analyses

320 Shan Gao

Free download pdf