modifications, chromosome conformation, and replication dynam-
ics. Like the scRNA-seq and scDNA-seq, the main computational
challenges in single-cell epigenome arise from the technical varia-
bility and low capture efficiency.
4.1 Controlling for
Technical Noise
Sequencing of single-cell epigenomic libraries is typically saturated,
such that a limited set of distinct DNA molecules is sequenced over
many reads. The first step in analyzing data from such experiments
is therefore the inference of the set of epigenetic features
(or molecules) that were successfully extracted from each of the
cells. As aggressive amplification is required to transform picograms
of source material into sequencing-ready libraries, there are multi-
ple possible sources of noise that must be considered when making
such initial inferences. For example, sequencing errors, possible
PCR effects, and even mispairing of paired-end reads on the
sequencer’s flow cells [70] can modify or splice erroneous cell- or
well-specific barcodes and thereby distribute reads from one mole-
cule among several cells. Even if such events are rare when evaluat-
ing their rate per sequence read, their effect can be considerable
when weighted against a limited set of true-positive molecules for
each cell [40, 70]. Negative control wells or labels must be used,
and the quality of single-cell libraries must be analyzed routinely to
ensure individual measurements can be interpreted as coming from
a single-cell source. The strategies for accounting for technical
noises have been discussed in detail with respect to scRNA-seq
where synthetic spike-ins and UMIs are introduced during reverse
transcription. Single-cell epigenomic methods would likely benefit
from similar strategies.
4.2 Accounting for
Missing Data
Properly filtered single-cell epigenomic profiles must be analyzed
while assuming they are missing a substantial amount of data.
Epigenomic data (or epi-alleles) are present in essentially single
copies within each cell, such that any imperfection in the single-
cell epigenomic recovery process results in loss of information.
With all current technologies, it is therefore not realistic to narrow
down the analysis to epigenomic features that are represented in all
(or even most) sampled cells. This is because the probability of
perfect simultaneous detection of the feature, assuming it is missing
with probabilityPper cell, scales down exponentially as (1P)n
with the size of the data set ofncells. Instead, different strategies
for imputation of missing values or analysis relying on more coarse-
grained features should be used [54, 190, 191]. Imputation can be
possible when the epigenomic structure is highly constrained, such
that partial observations are sufficient to determine the complete
epigenomic state. This may be the case in analyses of large-scale
chromosome conformations or replication landscapes [70] but not
when observing DNA methylation or local chromatin structure.
Alternatively, it is possible to form groups of epigenetic loci (based
Applications of Single-Cell Sequencing for Multiomics 363