A few researchers have started to consider the removal of some kinds
of uninterested biological variation in the scRNA-seq data. For
example, the cell-to-cell heterogeneity in gene expression can be
caused by stage differences of the cell cycle. A recent study intro-
duced a latent-variable model based on Gaussian processes to
account for variation caused by stage differences of the cell cycle [2].
Data normalization is essential as it determines the validity of its
downstream analyses. Currently, all the methods are used to nor-
malize a raw gene expression matrix by the multiplication of a factor
to each column of it and produce a normalized gene expression
matrix (Fig.2a). This factor is named as normalization factor, or
Fig. 2The commonly used normalization methods. (a) A raw gene expression matrix can be transformed into a
normalized gene expression matrix by the multiplication of a factorfjto each column. Each column represents
the expression levels of all genes from a cell, and each row represents the expression levels of a gene across
all samples. (b).Njrepresents the library size of the jth sample. Q 75 represents to obtain the third quartile
(75%) of all the gene expression proportions in thejth sample. The library size method, RLE, upper quartile,
and DESeq are described except TMM. TMM, RLE, and upper quartile have been used to calculate CPM
(counts per million) in the bioconductor package edgeR [8] for R environment
Data Analysis in Single-Cell Transcriptome Sequencing 315