Subpopulation detectionRaceID
Uses k-means applied to asimilarity matrix of Pearson’scorrelation coefficients for allpairs of cells; the number ofclusters is chosen using thegap statistic. Outlier cells arethose that cannot be explainedby a background model thataccounts for technical andbiological noise. In a secondstep, rare subpopulations canbe identified and outlier cellsmay be merged to an outliercluster; new cluster centers arethen computed and each cell isassigned to the most highlycorrelated cluster centerRequires a reduced set of genes.The authors consider geneswith a minimum of fivetranscripts in at least one cellPackageCommandlineUnix/Linux,Mac OS,WindowsR[96]BackSPINIteratively splits a two-waysorted (by both genes andcells) expression matrix intotwo clusters containingindependent cells and genes,for a maximum number ofsplits. The algorithm has astopping condition to avoidsplitting data that are veryhomogeneousRequires a reduced set of genesand the maximum number ofsplits allowed. The authorsrecommend selecting the top5000 genes that have thelargest residuals after fitting asimple noise modelPackageCommandlineUnix/LinuxPython[^97]ZIFAModels dropout rate as afunction of expression in afactor analysis (lineardimension reduction)frameworkRequires normalized,log-transformed estimates ofgene expression (zeros are nottransformed)PackageCommandlineUnix/LinuxPython[^98]
(continued)Applications of Single-Cell Sequencing for Multiomics 343