Nature - USA (2019-07-18)

Article reSeArcH

Barcode UMI Primer Shared Target

(R1.fastq) (R2.fastq)

Linear amplification

GoT reads Barcode UMI

Primer Shared Target

(R1.fastq) (R2.fastq)

Circular amplification PCR#2 Fw PCR#2 Rv

circGoT reads

a

b

Include reads with the known primer and ‘shared’ sequences allowing a mismatch ratio ≤ m

(1) Identification of reads with proper priming

*CB lists published by 10x

(2) Identification of cell barcodes within the whitelists*

Assess inter-duplicate reads (i.e. reads with the same CB & UMI) genotyping agreement

(4) Deduplication of reads (5) Analyze reads with CB that are also in the 10x scRNA-seq data

(3) Replacement of CB that are not identical to the whitelist CB Among candidate CBs which are 1- Hamming-distance away from the whitelisted CB, compute the probability that the observed CB deviated from the whitelisted CB due to a sequencing error at the differing base, and replace the observed CB with whitelisted CB when the probability exceeds 0.99

YESYES

NO YES

0 0.25 0.50.751

120k

0

120k

0 0.25 0.50.751 0 0.25 0.50.751

To tal UMI

hg38

mm10

1291 / 1291 cells 1255 / 1291 cells 1251 / 1291 cells

120k

0

120k

To tal UMI

hg38

mm10

120k

0

120k

To tal UMI

hg38

mm10

No duplicate threshold Duplicate ≥ 2 Duplicate ≥ 3

Mismatch ratio = 0

Mismatch ratio = 0.6

Mismatch ratio = 0.2

Murine cell MUT CALR Human cell WT CALR Multiplets

0.6

1 0.85 2 3 4 5 6 7 8 9

10

0.975

1 1 2 3 4 5 6 7 8 9

10

0.75

0.9

00 .50.40.30.20.10.05 0.6

1 2 3 4 5 6 7 8 9

10 Mismatch ratio threshold

Duplicate threshold

Precision

Recall

F1 score

Ratio of barcodes replaced with whitelist

Averaged base error in amplicon reads

Averaged base error in primer sequence

Averaged base error in shared sequence

Averaged base error in target sequence

Number of total duplicates

0255075 Mean decrease accuracy

-0.25

0

0.25

0.50

0 0.05 0.10.2 0.3 0.4 0.5 0.6 Mismatch ratio

Cumulativ

e errors (Z-score)

-0.50 Ratio of cell lossOut-of-bag errors of prediction

-1

1

00 .50.40.30.20.10.05 0.6

1 2 3 4 5 6 7 8 9

Ratio of cell loss (Z-score)

-0.5

1

00 .50.40.30.20.10.05 0.6

1 2 3 4 5 6 7 8 9

Out-of-bag errors (Z-score)

c d

efg

Duplicate threshold Duplicate threshold

Mutant CALR UMI fraction

Extended Data Fig. 2 | Optimization of parameters in processing
targeted amplicon sequences in the IronThrone GoT pipeline.
a, Representation of amplicon reads. b, Flow chart of the GoT analysis
pipeline (Methods). CB, cell barcode. c, Mouse (green) and human (blue)
genome alignment of 10x data (y axes) with genotyping data by GoT
(x axes) with various thresholds for minimum duplicate reads (across)
and maximum mismatch ratio (down). d, Results of precision, recall
and F 1 score analysis for combinations of minimum duplicate reads and

maximum mismatch ratios. e, Measure of the importance of each variable used for the calculation of splits in trees in random-forest classification test. f, Ratio of cell loss and genotyping errors (z-score on y axis) based on mismatch ratio thresholds (x axis). The area of intersection is highlighted in grey around the mismatch ratio 0.2. g, Heat maps showing z-scores of the number of filtered cells (left) and predicted error rates (right) from random-forest classification tests for combinations of minimum duplicate reads and maximum mismatch ratio thresholds.

Nature - USA (2019-07-18)

Get our desktop app

Company

Features

Documentation

Resources