Subpopulation detectionRaceID
Uses k-means applied to a
similarity matrix of Pearson’scorrelation coefficients for allpairs of cells; the number ofclusters is chosen using thegap statistic. Outlier cells arethose that cannot be explainedby a background model thataccounts for technical andbiological noise. In a secondstep, rare subpopulations canbe identified and outlier cellsmay be merged to an outliercluster; new cluster centers arethen computed and each cell isassigned to the most highlycorrelated cluster center
Requires a reduced set of genes.
The authors consider geneswith a minimum of fivetranscripts in at least one cell
Package
Command
line
Unix/
Linux,Mac OS,Windows
R[
96
]
BackSPIN
Iteratively splits a two-way
sorted (by both genes andcells) expression matrix intotwo clusters containingindependent cells and genes,for a maximum number ofsplits. The algorithm has astopping condition to avoidsplitting data that are veryhomogeneous
Requires a reduced set of genes
and the maximum number ofsplits allowed. The authorsrecommend selecting the top5000 genes that have thelargest residuals after fitting asimple noise model
Package
Command
line
Unix/
Linux
Python
[^97
]
ZIFA
Models dropout rate as a
function of expression in afactor analysis (lineardimension reduction)framework
Requires normalized,
log-transformed estimates ofgene expression (zeros are nottransformed)
Package
Command
line
Unix/
Linux
Python
[^98
]
(continued)
Applications of Single-Cell Sequencing for Multiomics 343