Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1

Subpopulation detectionRaceID


Uses k-means applied to a

similarity matrix of Pearson’scorrelation coefficients for allpairs of cells; the number ofclusters is chosen using thegap statistic. Outlier cells arethose that cannot be explainedby a background model thataccounts for technical andbiological noise. In a secondstep, rare subpopulations canbe identified and outlier cellsmay be merged to an outliercluster; new cluster centers arethen computed and each cell isassigned to the most highlycorrelated cluster center

Requires a reduced set of genes.

The authors consider geneswith a minimum of fivetranscripts in at least one cell

Package

Command

line

Unix/

Linux,Mac OS,Windows

R[

96

]

BackSPIN

Iteratively splits a two-way

sorted (by both genes andcells) expression matrix intotwo clusters containingindependent cells and genes,for a maximum number ofsplits. The algorithm has astopping condition to avoidsplitting data that are veryhomogeneous

Requires a reduced set of genes

and the maximum number ofsplits allowed. The authorsrecommend selecting the top5000 genes that have thelargest residuals after fitting asimple noise model

Package

Command

line

Unix/

Linux

Python

[^97

]

ZIFA

Models dropout rate as a

function of expression in afactor analysis (lineardimension reduction)framework

Requires normalized,

log-transformed estimates ofgene expression (zeros are nottransformed)

Package

Command

line

Unix/

Linux

Python

[^98

]
(continued)

Applications of Single-Cell Sequencing for Multiomics 343
Free download pdf