Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
counts can be collapsed by summing the number of unique bar-
codes associated with all reads mapped to a given gene. When
performing this analysis, care must be taken into account for
sequencing errors in the UMIs that might result in the appearance
of artificial molecules. To overcome this, error correction of the
barcodes and/or removal of singleton barcodes may be required
[34]. Although scRNA-seq data can, in principle, be used to quan-
tify the expression of individual exons or to resolve isoform abun-
dance, such analyses are currently challenging owing to the large
proportion of technical noise and biases compared to the tradi-
tional RNA-seq protocols. Till now, only two reports claimed their
approaches hold the promise to isoform quantification using
scRNA-seq data sets. The first one is SingleSplice [123], which
uses a statistical model, hurdle model, to detect gene whose iso-
form usage shows biological variation significantly exceeding tech-
nical noise in a population of single cells. It circumvents the pitfalls
of low coverage and 3^0 bias which hamper the assembly and quan-
tification of full-length isoforms, by only considering the partial
regions that undergo alternative splicing, called alternative splicing
modules (ASM). Importantly it requires the spike-ins to model the
technical variability. Most recently, an algorithm called Census
[124] was developed to convert relative RNA-seq expression level
into relative transcripts counts without the need for experimental
spike-in controls. It can handle the splicing pattern among the
single cells. All these attempts are aiming at the circumvention or
reducing the effects of factors specific for single-cell sequencing.

2.1.3 Quality Control Quality control (QC) is needed for both the raw reads and library
size or called cell size (Fig.3). Similar to the bulk RNA-seq experi-
ments, FastQC or Kraken [125] can be used to for QC of the
scRNA-seq raw reads. The resulted data can be visualized using
Integrative Genomic Viewer [126, 127]. These steps will help to
identify potential sample mix-ups and external contamination or
whether there was problem with the sequencing itself as opposed to
the single-cell capture and amplification.
After alignment and initial read counts are obtained, quality
control needs to be applied to identify poor-quality libraries of
individual cells, which is perhaps extremely important for scRNA-
seq data, since currently all protocols for single-cell library display a
very low capture rate and high amplification bias. This is a more
serious problem for primary tissue samples, as the process of
extracting a tissue and then isolating individual cells can affect the
quality of the RNA collected. Three commonly used metrics are
available for this step before going through downstream analyses.
The first metric is the same as was used for bulk RNA-seq data.
It is the fraction of reads that map back to the genome of the
organism of interest (i.e., the rate of mapped reads), which can be


Applications of Single-Cell Sequencing for Multiomics 351
Free download pdf