Nature - USA (2019-07-18)

(Antfer) #1

Article reSeArcH


0

5

10

15

Log

(Number of supporting reads per barcode) 2

0

10

20

30

Base quality at the differing base

10

20

13.51 4 14.5

10

20

30

6 6.5 7
Log 2 (Number of supporting reads)

Median base quality at the differing base

0.80.9 1

0.8

0.9

1

Sensitivity

Precision

Reads with CB that are not identical
to the whitelist’s CB

> Hamm-1
Hamm-1, not significant
“Not significant”
36.9%
59.2%

3.9%

0

80

160

123456789 10 11 12 13 14 15 16
Number of supporting reads

per candidate barcode

Position in barcode that is 1-Hamming distance away from the whitelist

Base quality
at the diffe

ring base

0

20

40

123456789 10 11 12 13 14 15 16^0

20

40

123456789 10 11 12 13 14 15 16

0

10000

20000

123456789 10 11 12 13 14 15 16

Not significant Replaced

P < 10-10 P < 10-10

R^2 = 0.0061
P = 0.79

Not significant

R^2 = 0.75
P = 1.54 x 10-5

Replaced

Po

sterior probability

0

1

(^05) Density 0 100
Prior probability
0.5
Not significant
Replaced
0
1
(^05) Density 0 100
0.5
Not significant
Replaced
04812 88 92 96 100
04812 88 92 96 100
Fraction of reads (%)
04812 88 92 96 100 00 .002 99.8 99.9 100
Fraction of reads (%)
00 .002 99.8 99.9 100
00 .002 99.8 99.9 100
Hamm-1, significant
“Replaced”
Replaced Not significant Replaced Not significant



Hamm-1
Not significant
Replaced
Hamm-1
Not significant
Replaced
Hamm-1
Not significant
Replaced
1-base changed
3-bases changed
2-bases changed
Not identical
Identical
Not identical
Identical
Not identical
Identical
Synthetic CB generated by changing whitelist
CB at random bases (x100)
Determine whether replaced CB is
identical to the original CB
Prediction of classification
with the synthetic CB
ab
cd e
fgh
n = 1727573667115591 100951209996108148 n = 151028979981189119496933211178133191198916060158691654617234160561837025833
Extended Data Fig. 10 | Evaluation of barcode replacement in
IronThrone GoT processing. a, Fraction of reads with cell barcodes that
are not perfectly matched to the whitelisted cell barcodes from the species-
mixing experiment. ‘>Hamm-1’ denotes filtered reads with barcodes that
are more than one Hamming distance away from whitelisted barcodes
(n = 139,422 reads). ‘Not significant’ denotes filtered reads with barcodes
that are one Hamming distance away from the whitelisted barcodes, but
which have a low probability of originating from the barcode (posterior
probability < 0.99, n = 14,830 reads). ‘Replaced’ denotes rescued reads
with barcodes that have candidates that are one Hamming distance away
from the whitelisted barcodes, with statistical significance (posterior
probability ≥ 0.99, n = 224,085 reads). b, c, Number of supporting reads
per candidate barcode and base quality at the differing base positions
(b) and across base positions (c). Two-sided Wilcoxon rank-sum tests
were applied to compare not significant (n = 14,830) and replaced
(n = 224,085) barcodes. d, Correlation between the number of supporting
reads per candidate barcode and median base quality at the differing base
(two-tailed Pearson’s correlation, F-test). e, Distribution of prior and
posterior probabilities from not significant (n = 14,830) and replaced
(n = 224,085) barcodes. The dashed red line represents the posterior
probability cut-off (0.99). f–h, To further evaluate the efficiency of barcode
replacement, we generated synthetic cell barcodes by randomly changing
one base in whitelisted cell barcodes (n = 100 iterations). f, Percentage
of reads with cell barcodes that are not identical to the whitelisted
cell barcodes (n = 100 iterations). Percentages of replaced reads were
99.1% ± 0.001% (median ± absolute deviation) in simulations with 1
base changed, 1.1% ± 0.002% in simulations with 2 bases changed and
0.7 ± 0.001% in simulations with 3 bases changed. g, Determination of
whether replaced cell barcodes are identical to the original cell barcodes.
In simulations with 1 base change, the percentage of reads with replaced
cell barcodes that were identical to the original cell barcodes was 100  ±  0 %
(median ± absolute deviation of 100 iterations). h, Estimation of
prediction power for classifying cell barcodes from simulations with 1 base
changed (n = 100 iterations).


Free download pdf