Nature - USA (2020-08-20)

(Antfer) #1
Nature | Vol 584 | 20 August 2020 | E21

Non-PCR-based hybrid-capture sequencing

Read mapping

...


... Clipped backbone sequenceAPP vector


Reference
genome

...


...


...


...


PCR-based assays: cloning, SMRT-seq

Source APP
(sample DNA)

APP insert
(vector contaminant)

No PCR product
... owing to large gene size

...


APP(sample DNA) retrocopy ...


5 ′ UTR 3 ′ UTR

AAAA

Indistinguishable

PCR primers
targeting coding exons

NeuN+ neuronal nuclei
APPcoding exons only insert with

...


......
...

APP coding sequence end APP coding sequence start

Clipped vector
backbone sequence

Reference
genome
(GRCh38)

APP vector contamination

Source APP

APP insert
(vector contaminant)

APP retrocopy ... AAA


Source APP site

PCR
exon
intron

...
...
...

...
...
...

Paired-end
sequence reads

... AAA Clipped retrocopy
target site sequence

5 ′ UTR 3 ′ UTR

Vector (pGEM-T Easy)
backbone sequence

CGCGAATTCACTAGTGAT
GCGCTTAAGTGATCACT A

3 ′ T overhang
T
A APP insert

AATCGAATTCCCGCGGCCG
TTAGCTTAAGGGCGCCGGC

3 ′ T overhang
APP insert

a

b

3 ′UTR 5 ′UTR

AAA

AGGCTGCTGTGGCGGGGGTCTAGTTCTGCATCTGCTC

********** NNNNNNNNNN QQQQQQQQQQ MMMMMMMMMM QQQQQQQQQQ EEEEEEEEEE

CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
C G T

TGCCAAACCGGGCAGCATCGCGACCCTGCGCGGGGCA
G AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAACTTTTCCCCCCTGCCGGGTCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCCCGCGGGCCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
CC AATCGAATTCCCGCGGCCG

AAAAAAAA LLLLLLLL GGGGGGGG PPPPPPPP LMLMLMLMLMLMLML M

2,000
1,000
650
400
200

1–181–18N2–17

APP-751 APP-695

Restriction sites BstZINotIEcoRI SpeI EcoRI SacII BstZINotI

c

25,881,660 bp 25,881,670 bp 25,881,680 bp 26,170,610 bp 26,170,620 bp 26,170,630 bp

0

5
Clipped r

ead fraction (%)

123456791011 1213 1415161718
Exons

10

15

20

Estimated by APP vector clipped sequences

Expected fraction estimated frExpected fraction estimated from the Lee study DISH experiment om the Lee study DISH experiment

1–181–18N2–17 Sequence homology between two junctions

R6/17

CA
CA
Exon2 CA

...CC Exon17
AT...

R2/17

R6/18

R3/14

R3/17

AGCCAAC
AGCCAAC
AGCCAAC

...GA
AC...

Exon14
Exon3

GCAGTG
GCAGTG
GCGGTG

...AA
TT...

Exon17
Exon3

GAGGA
GAGGA
GAGGA

...AC
GC...

Exon18
Exon6

AGATGGGAGTGAAGACAAAG
AGATGGGAGTGAAGACAAAG
AGATGTGGGTTCAAACAAAG

...GC
GT...

Exon17
Exon6

R2/16 AT
AT
AT

...GT
GC...

Exon16
Exon2

R2/14 ACCAAGGA
ACCAAGGA
ACCAAGGA

...AT
TC...

Exon14
Exon2

R1/14 GCTC
GCTC
GCTC

...CG
CT...

Exon14
Exon1

Fig. 1 | APP vector contamination in the Lee study. a, APP vector contamination
and its manifestation in genome sequences. PCR-based assays in the Lee study^2
fail to distinguish between APP retrocopy and vector APP insert. Hybrid-capture
sequences from the Lee study show clipped reads with a vector backbone
sequence (pGEM-T Easy), including restriction sites at the multiple cloning site
and a 3′ T-overhang. b, Estimated fractions of cells with APP gencDNA at the exon
junctions in the Lee hybrid-capture data. All exon junction fractions (black dots)
are comparable to the fraction at the coding sequence ends with vector


backbone sequences (red dots). The dotted line above represents the
conservative estimate of expected fraction based on the Lee DISH experiment
(see Supplementary Methods); shaded area, 95% confidence interval.
c, Electrophoresis and sequencing of PCR products from the vector APP inserts
(APP-751/695) showing new APP variants as artefacts. Eight out of twelve IEJs
found both in our APP vector PCR sequencing and the Lee study RT–PCR results
are shown (Extended Data Fig. 3).
Free download pdf