Computational Systems Biology Methods and Protocols.7z

A few researchers have started to consider the removal of some kinds of uninterested biological variation in the scRNA-seq data. For example, the cell-to-cell heterogeneity in gene expression can be caused by stage differences of the cell cycle. A recent study intro- duced a latent-variable model based on Gaussian processes to account for variation caused by stage differences of the cell cycle [2]. Data normalization is essential as it determines the validity of its downstream analyses. Currently, all the methods are used to nor- malize a raw gene expression matrix by the multiplication of a factor to each column of it and produce a normalized gene expression matrix (Fig.2a). This factor is named as normalization factor, or

Fig. 2The commonly used normalization methods. (a) A raw gene expression matrix can be transformed into a
normalized gene expression matrix by the multiplication of a factorfjto each column. Each column represents
the expression levels of all genes from a cell, and each row represents the expression levels of a gene across
all samples. (b).Njrepresents the library size of the jth sample. Q 75 represents to obtain the third quartile
(75%) of all the gene expression proportions in thejth sample. The library size method, RLE, upper quartile,
and DESeq are described except TMM. TMM, RLE, and upper quartile have been used to calculate CPM
(counts per million) in the bioconductor package edgeR [8] for R environment

Data Analysis in Single-Cell Transcriptome Sequencing 315

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources