Computational Systems Biology Methods and Protocols.7z

estimates. This applies to any downstream analysis but is particularly important when comparing expression levels between cells or when assessing the variability of individual genes. Because of the typically low capture efficiency of current scRNA-seq protocols, even moderately expressed genes are frequently undetected. Con- sequently, methods to accurately estimate the extent of this technical variability are crucial in order to differentiate between genuine gene expression changes and experimental artefacts. The use of spike-ins as control genes is appealing, since the same amount of ERCC (or other) spike-in was added to each cell in our experiment. In principle, all the variability we observe for these genes is due to technical noise, whereas endogenous genes are affected by both technical noise and biological variability. Technical noise can be removed by fitting a model to the spike-ins and “substracting” this from the endogenous genes. There are several methods available based on this premise (e.g., BASiCS [130, 131], scLVM [79], RUVg [132]), each using different noise models and different fitting procedures. Alternatively, one can identify genes which exhibit significant variation beyond technical noise (e.g., distance to median, highly variable genes). However, there are issues with the use of spike-ins for normalization (particularly ERCCs, derived from bacterial sequences), including that their variability can, for various reasons, actually be higher than that of endogenous genes. Given the issues with using spike-ins, better results can often be obtained by using endogenous genes instead. Where we have a large number of endogenous genes that, on average, do not vary systematically between cells and where we expect technical effects to affect a large number of genes (a very common and reasonable assumption), then such methods (e.g., the RUVs method [132]) can perform well. Although almost all analyzing approaches take the technical noise into account, the readers are recommended to refer to the most common strategies used in refs.88, 130 for details.

2.4 Getting
Biological Insights into
Single-Cell RNA
Sequencing

In this section, we will give a review of the applications of scRNA- seq to the specific biological questions for which bulk RNA-seq may not give the answers. Thus far, single-cell RNA sequencing has already shown great effectiveness in unraveling complex cell popu- lations, reconstructing developmental trajectories, modeling tran- scriptional dynamics, and so on. In all the following analyses, we assume that the input data is a matrix of gene expression or tran- script counts that have been normalized and cleared the technical visibilities using whatever approaches described above.

2.4.1 Accounting for
Heterogeneity: Cell Identity
and Cellular State

Solid tissues of human and other eukaryotes comprise of several different types of cells. These different cell types have distinct transcriptomic profiles. Although there are a lot of computational

Applications of Single-Cell Sequencing for Multiomics 355

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources