Nature | Vol 578 | 13 February 2020 | 269
T:A pairing. This is consistent with in vitro data that show that purines
are more reactive than pyrimidines with mutagens in tobacco smoke^5.
As described above, an unexpectedly high fraction of cells in ex-smok-
ers had a near-normal mutational burden. These cells had considerably
SBS-1
Sig-A
SBS-5
SBS-16
SBS-4
Sig-B
d
Substitutions per cell per year
Signatur
e
Effect size of age
SBS-1
Sig-A
SBS-5
SBS-16
SBS-4
Sig-B
0510 15 0 1,000 2,000
Extra substitutions per cell
Effect size of smoking status
SBS-1
Sig-A
SBS-5
SBS-13SBS-2/ SBS-13SBS-2/
SBS-13SBS-2/ SBS-13SBS-2/
SBS-4
0200 400
Signatur
e
Between-patient s.d.
SBS-1
Sig-A
SBS-5
SBS-4
0250 500750 1,000
Within-patient s.d.
Ex-smoker
Current smoker
Substitutions per cell Substitutions per cell
c
0
100
300
SBS-1
0
2,000
5,000
SBS-5
Substitutions per cell
Age, P = 0
Smoking, P = 0
020406080
0
500
1,500
2,500 Sig-A
Age (y)
Age, P = 0.003
Smoking, P = 0.003
200
400
Age, P = 0.001
Smoking, P = 0.06
b
Sig-A
Sig-B
C>A C>G C>T T>A T>C T>G
ACAACGCCACCGGCAGCGTCATCGACCACTCCCCCTGCCGCTTCCTCTACAACGCCACCGGCAGCGTCATCGAT
CATT
CTCCTTGTCGTTTTCTTTATAAT
GCTA
CTGGT
A
GTGTTATTGAT
CATT
CTCCTTGTCGTTTTCTTT
0
0.01
0.02
0.03
C>A C>G C>T T>A T>C T>G
0
0.005
0.010
0.015
0.020
Strand
Tr anscribed
Untranscribed
ACCACTCCCCCTGCCGCTTCCTCTACAACGCCACCGGCAGCGTCATCG AT
C
AT
T
ATAAT CTCCTTGTCGTTTTCTTT
GCTA
CTGGT
A
ACCACTCCCCCTGCCGCTTCCTCT GTGTTATTG ATAAT
GCTA
CTGGT
A
GTGTTATTG
a PD37455
F, 11 m
PD37456
F, 1 y
PD37453
F, 3 y
PD37454
M, 59 y
PD34215
F, 73 y
PD34209
F, 75 y
PD37451
F, 80 y
PD34205
M, 54 y
PD34210
F, 68 y
PD37452
F, 75 y
PD34206
M, 76 y
PD26988
M, 81 y
PD34204
M, 61 y
PD34207
F, 65 y
PD34211
M, 74 y
Sig-B
SBS-4
SBS-16
SBS-18
SBS-13
SBS-2
SBS-5
Sig-A
SBS-1
Unallocated
Child
Never-smoker
Ex-smoker
Proportion (%) Current smoker
Proportion (%)
To tal
0
25
50
75
SBSs 1000
15,000
0
25
50
75
1000
3,000
To talSBSs
PD30160
M, 71 y
Non-smoker
Ex-smokerCurrent smoker
Child
Never-smoker
Ex-smokerCurrent smoker
lower proportions of SBS-4 mutations than cells with an increased
mutational burden in the same patients. Instead, the distribution of
signatures in these near-normal cells resembled that seen in never-
smokers, with prominent endogenous signatures such as SBS-5, SBS-1
Fig. 2 | Mutational signatures in normal bronchial epithelium. a, Stacked bar
plot showing the proportional contribution of mutational signatures to single-
base substitutions across the n = 632 colonies from normal bronchial cells,
extracted using a hierarchical Dirichlet process (HDP). Within each patient,
colonies are sorted from left to right by increasing mutational burden (bar chart
in dark grey above coloured signature-attribution stacks). The dashed black
vertical lines in current and ex-smokers denote the cut-off between cells with a
near-normal and an increased mutational burden. b, Trinucleotide context
spectrum on transcribed and untranscribed strands of two new SBS signatures
(Sig-A and Sig-B). The six substitution types are shown across the top. Within
each substitution type, the trinucleotide context is shown as four sets of eight
bars, grouped by whether an A, C, G or T, respectively, is 5′ to the mutated base,
and within each group of eight by whether A, C, G or T is 3′ to the mutated base.
The activity of the mutational signature on the untranscribed strand is shown in a
pale colour; on the transcribed strand it is shown in a darker colour. c, Number of
base substitutions attributed to the three endogenous signatures across the
cohort (y axis; n = 632 colonies) shown according to the age of the subject (x axis).
The black line represents the fitted effect of age, which was estimated from LME
models after correction for smoking status and within-patient correlation
structure. The blue shaded area represents the 95% CI for the fitted line. The
quoted P values for the fixed effects of age and smoking are derived from the full
LME models. d, Estimated effect sizes of age, smoking status, between-patient
and within-patient standard deviation of seven signatures (points) with 95% CIs
(horizontal lines). Estimates are derived from LME models (n = 632).