Nature | Vol 584 | 20 August 2020 | E21
Non-PCR-based hybrid-capture sequencing
Read mapping
...
... Clipped backbone sequenceAPP vector
Reference
genome
...
...
...
...
PCR-based assays: cloning, SMRT-seq
Source APP
(sample DNA)
APP insert
(vector contaminant)
No PCR product
... owing to large gene size
...
APP(sample DNA) retrocopy ...
5 ′ UTR 3 ′ UTR
AAAA
Indistinguishable
PCR primers
targeting coding exons
NeuN+ neuronal nuclei
APPcoding exons only insert with
...
......
...
APP coding sequence end APP coding sequence start
Clipped vector
backbone sequence
Reference
genome
(GRCh38)
APP vector contamination
Source APP
APP insert
(vector contaminant)
APP retrocopy ... AAA
Source APP site
PCR
exon
intron
...
...
...
...
...
...
Paired-end
sequence reads
... AAA Clipped retrocopy
target site sequence
5 ′ UTR 3 ′ UTR
Vector (pGEM-T Easy)
backbone sequence
CGCGAATTCACTAGTGAT
GCGCTTAAGTGATCACT A
3 ′ T overhang
T
A APP insert
AATCGAATTCCCGCGGCCG
TTAGCTTAAGGGCGCCGGC
3 ′ T overhang
APP insert
a
b
3 ′UTR 5 ′UTR
AAA
AGGCTGCTGTGGCGGGGGTCTAGTTCTGCATCTGCTC
********** NNNNNNNNNN QQQQQQQQQQ MMMMMMMMMM QQQQQQQQQQ EEEEEEEEEE
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGATT
CCGGCCGGAAAATTTTCCAACCTTAAGGTTGGAATT
C G T
TGCCAAACCGGGCAGCATCGCGACCCTGCGCGGGGCA
G AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAACTTTTCCCCCCTGCCGGGTCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
AAAATTCCGGAAAATTTTCCCCCCCCGCGGGCCCCCGG
AAAATTCCGGAAAATTTTCCCCCCGGCCGGGGCCCCGG
CC AATCGAATTCCCGCGGCCG
AAAAAAAA LLLLLLLL GGGGGGGG PPPPPPPP LMLMLMLMLMLMLML M
2,000
1,000
650
400
200
1–181–18N2–17
APP-751 APP-695
Restriction sites BstZINotIEcoRI SpeI EcoRI SacII BstZINotI
c
25,881,660 bp 25,881,670 bp 25,881,680 bp 26,170,610 bp 26,170,620 bp 26,170,630 bp
0
5
Clipped r
ead fraction (%)
123456791011 1213 1415161718
Exons
10
15
20
Estimated by APP vector clipped sequences
Expected fraction estimated frExpected fraction estimated from the Lee study DISH experiment om the Lee study DISH experiment
1–181–18N2–17 Sequence homology between two junctions
R6/17
CA
CA
Exon2 CA
...CC Exon17
AT...
R2/17
R6/18
R3/14
R3/17
AGCCAAC
AGCCAAC
AGCCAAC
...GA
AC...
Exon14
Exon3
GCAGTG
GCAGTG
GCGGTG
...AA
TT...
Exon17
Exon3
GAGGA
GAGGA
GAGGA
...AC
GC...
Exon18
Exon6
AGATGGGAGTGAAGACAAAG
AGATGGGAGTGAAGACAAAG
AGATGTGGGTTCAAACAAAG
...GC
GT...
Exon17
Exon6
R2/16 AT
AT
AT
...GT
GC...
Exon16
Exon2
R2/14 ACCAAGGA
ACCAAGGA
ACCAAGGA
...AT
TC...
Exon14
Exon2
R1/14 GCTC
GCTC
GCTC
...CG
CT...
Exon14
Exon1
Fig. 1 | APP vector contamination in the Lee study. a, APP vector contamination
and its manifestation in genome sequences. PCR-based assays in the Lee study^2
fail to distinguish between APP retrocopy and vector APP insert. Hybrid-capture
sequences from the Lee study show clipped reads with a vector backbone
sequence (pGEM-T Easy), including restriction sites at the multiple cloning site
and a 3′ T-overhang. b, Estimated fractions of cells with APP gencDNA at the exon
junctions in the Lee hybrid-capture data. All exon junction fractions (black dots)
are comparable to the fraction at the coding sequence ends with vector
backbone sequences (red dots). The dotted line above represents the
conservative estimate of expected fraction based on the Lee DISH experiment
(see Supplementary Methods); shaded area, 95% confidence interval.
c, Electrophoresis and sequencing of PCR products from the vector APP inserts
(APP-751/695) showing new APP variants as artefacts. Eight out of twelve IEJs
found both in our APP vector PCR sequencing and the Lee study RT–PCR results
are shown (Extended Data Fig. 3).