Computational Systems Biology Methods and Protocols.7z

modifications, chromosome conformation, and replication dynam- ics. Like the scRNA-seq and scDNA-seq, the main computational challenges in single-cell epigenome arise from the technical varia- bility and low capture efficiency.

4.1 Controlling for
Technical Noise

Sequencing of single-cell epigenomic libraries is typically saturated, such that a limited set of distinct DNA molecules is sequenced over many reads. The first step in analyzing data from such experiments is therefore the inference of the set of epigenetic features (or molecules) that were successfully extracted from each of the cells. As aggressive amplification is required to transform picograms of source material into sequencing-ready libraries, there are multi- ple possible sources of noise that must be considered when making such initial inferences. For example, sequencing errors, possible PCR effects, and even mispairing of paired-end reads on the sequencer’s flow cells [70] can modify or splice erroneous cell- or well-specific barcodes and thereby distribute reads from one mole- cule among several cells. Even if such events are rare when evaluat- ing their rate per sequence read, their effect can be considerable when weighted against a limited set of true-positive molecules for each cell [40, 70]. Negative control wells or labels must be used, and the quality of single-cell libraries must be analyzed routinely to ensure individual measurements can be interpreted as coming from a single-cell source. The strategies for accounting for technical noises have been discussed in detail with respect to scRNA-seq where synthetic spike-ins and UMIs are introduced during reverse transcription. Single-cell epigenomic methods would likely benefit from similar strategies.

4.2 Accounting for
Missing Data

Properly filtered single-cell epigenomic profiles must be analyzed while assuming they are missing a substantial amount of data. Epigenomic data (or epi-alleles) are present in essentially single copies within each cell, such that any imperfection in the single- cell epigenomic recovery process results in loss of information. With all current technologies, it is therefore not realistic to narrow down the analysis to epigenomic features that are represented in all (or even most) sampled cells. This is because the probability of perfect simultaneous detection of the feature, assuming it is missing with probabilityPper cell, scales down exponentially as (1P)n with the size of the data set ofncells. Instead, different strategies for imputation of missing values or analysis relying on more coarse- grained features should be used [54, 190, 191]. Imputation can be possible when the epigenomic structure is highly constrained, such that partial observations are sufficient to determine the complete epigenomic state. This may be the case in analyses of large-scale chromosome conformations or replication landscapes [70] but not when observing DNA methylation or local chromatin structure. Alternatively, it is possible to form groups of epigenetic loci (based

Applications of Single-Cell Sequencing for Multiomics 363

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources