modeled as binomial process [168]. It is a soft clustering approach
that provides the probabilities that a cell comes from the different
clones rather than simply subdividing phylogenetic tree generated
by distance-based clustering. Parameters in the model, such as the
probability of a particular single cell originating from a specific
clone, as well as the false-negative rate, can be estimated across a
distinct number of possible clones using an expectation–maximiza-
tion (EM) algorithm [169]. The challenge of determining the
number of clones is then reduced to selecting the statistical model
that best describes the observed single-cell data using Bayesian or
Akaike information criteria [170]. There is also a hybrid approach
based on obtaining an initial estimate of the number of clones
derived from distance-based hierarchical clustering, which increases
the convergence speed of the computationally intensive model-
based methods [171].
After estimating the number of clones in a sample and deter-
mining which clone each cell belongs to, a consensus clonal muta-
tion profile can be established. Bentley et al. have done this using
mutation frequency cutoff values that exceed the false-negative rate
[172], although more rigorous statistical methods could be devel-
oped. After determining the clonal genotype, the relationships
between clones can be determined (Fig.4b). There are a number
of algorithms used in evolutionary biology that can be applied to
establish clonal structures [173], such as those based on maximum
parsimony, maximum likelihood, or distance-based methods such
as unweighted pair group method with arithmetic mean, neighbor
joining, and minimum evolution algorithms [173, 174]. Finally,
the clonal structures can be visualized as fishplot [175] or evolution
tree (Fig.4c), which have been introduced in a lot of publications
[167, 176–189].
4 Computational View of Single-Cell Epigenomics
Epigenomics is the study of the chemical modifications of genomic
DNA sequences and associated chromatin proteins such as histone
proteins, without changing the sequence per se. It aims at dissect-
ing the functions of these epigenetic properties associated with
cellular memory, identity, and tissue-specific gene expressions.
While the current techniques in the field are characterizing the
average epigenomic features of large cell populations of bulk sam-
ples, the increasing interest in the epigenetic heterogeneity within
complex tissues is driving the development of single-cell epige-
nomics. In this section, we briefly introduce the computational
methods and challenges to analyzing single-cell epigenomic data,
including DNA methylation, chromatin accessibility, histone
362 Yungang Xu and Xiaobo Zhou