Computational Systems Biology Methods and Protocols.7z

Feature reduction is performed often using two criteria. The first one is to remove genes with non-zero read counts in less than three cells. The second one is to use highly variable genes. A commonly used R package Seurat works by calculating the average expression with dispersion to select variable genes [16]. In our previous studies, we introduced another method which calculated Pearson correlation coefficients (PCCs) between normalized expression values and library sizes and used genes with PCCs less than 0.6 for cluster analysis. In addition, we found that feature reduction had little effect on the results of cluster analysis, particu- larly for data containing samples less than 1000. However, using variable genes improves the cluster analysis. To overcome the extensive technical noise, principal component (PC) reduction can be used to further remove noise in the scRNA-seq data. Principal component analysis (PCA) is a commonly used dimension reduction method to produce PCs from the gene expression matrix [17]. The R package Seurat clusters cells based on their PCA scores, with each PC essentially represent- ing a “metagene” that combines information across a correlated gene set [16]. Determining how many PCs to include (PC reduction) is therefore an important step. Based on our previous studies, PC reduction made clusters more obviously separated. Currently, PC reduction has been integrated into many software or R packages (e.g., Rt-SNE).

Fig. 4Fundamental problems in the scRNA-seq data analysis. Normalization, sample reduction, feature reduction, and cluster analysis are fundamental problems in the scRNA-seq data analysis. DE analysis represents differential expression analysis, which is the most typical one of the downstream analyses

320 Shan Gao

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources