270 | Nature | Vol 578 | 13 February 2020
Article
and Sig-A. Phylogenetically, cells with a near-normal mutational burden
showed polyclonal origins (Fig. 3a, Extended Data Fig. 6), suggesting
that they do not arise from the expansion of a single ancestral cell.
Signatures of indels and double-base substitutions that were
observed in normal bronchial epithelium matched those extracted
from lung cancers^24 and those generated in vitro by exposure of cells
to polycyclic aromatic hydrocarbons^19 (Extended Data Figs. 7, 8). A his-
tory of tobacco smoking was particularly associated with a signature of
double-base substitutions at CpC (equivalently GpG) dinucleotides—a
finding that is in accordance with the high rates of C>A (G>T) single-
base substitutions in SBS-4. Similarly, tobacco exposure was associated
with an indel signature of single-base deletions of cytosines (guanines)
in our dataset. Together, these data suggest that the propensity of
polycyclic aromatic hydrocarbons in tobacco smoke to bind guanine
nucleotides can result in a range of mutation types even in normal bron-
chial epithelial cells, including single-base substitutions, dinucleotide
substitutions and small indels.
Driver mutations
To assess whether any mutations are under positive selection in normal
bronchial epithelium, we applied an algorithm, dNdScv, which identi-
fies and quantifies the number of excess non-synonymous mutations
compared with the number expected from the rate of synonymous
TP53
NOTCH1
FAT1
CHEK2
PTEN
ARID1A
ARID2
IDH1
EP300
CREBBP
PIK3CA
b c
d
*
e
0 10 20 20
No. of
colonies
30 051015
No. of
unique mutations
Nonsense
Missense
Splice
Synonymous
Multiple
Frameshift
*
*
*
*
*
**
**
0
0.25
0.50
0.75
1.00
TP53 NOTCH1Other All
mutations
Frequency
Shared Single colony
drivers
Child Never-smokerEx-smoker Smoker
PD37455
11 m
PD37456
1 y
PD37453
3 y
PD37454
59 y
PD34215
73 y
PD34209
75 y
PD37451
80 y
PD34205
54 y
PD34210
68 y
PD30160
71 y
PD37452
75 y
PD34206
76 y
PD26988
81 y
PD34204
61 y
PD34207
65 y
PD34211
74 y
0
0.25
0.50
0.75
1.00
0 drivers 1 driver 2 drivers 3 drivers
Patient
Smoking
LOH
TP53
NOTCH1
FAT1
CHEK2
ARID1A
PTEN
ARID2
IDH1
EP300
PIK3CA
CREBBP
Current smoker
Ex-smoker
Never-smoker
LOH
Missense
Nonsense
Frameshift
Splice site
Synonymous
Multiple
No. of substitutions No. of substitutions No. of substitutions
a PD34215, F, 73 y
Never-smoker
0 500 1,0001,5002,000
ARID2
FAT1
NOTCH1, IDH1
NOTCH1
PD34206, M, 76 y
Ex-smoker
0 2,000 4,0006,000 8,000
ARID2
ARID1A
NOTCH1
PIK3CA
PD34211, M, 74 y
Current smoker
0 4,000 8,000 12,000
ARID1A
ARID1A, FAT1
ARID1A, FAT1, NOTCH1
PTEN
TP53
TP53
FAT1
ARID1A
CREBBP
SBS-1
Sig-A
SBS-5
SBS-2
SBS-13
SBS-18
SBS-16
SBS-4
Sig-B
Unallocated
Fig. 3 | Driver mutations in normal bronchial epithelial cells. a, Phylogenetic
trees showing clonal relationships among normal bronchial cells in three
representative subjects. Branch lengths are proportional to the number of
mutations (x axis) specific to that clone or subclone. Each branch is coloured by
the proportion of mutations on that branch that are attributed to the various
SBS signatures. The driver mutations that were identified in each branch are
also shown (black, SBS; red, indel). b, Total number of colonies with mutations
(left) and number of unique mutations (right) in key cancer genes across the
sample set (n = 632). ** represents genes that are significant (q < 0.05 by
dNdScv) when correction for multiple-hypothesis testing is applied across all
coding genes; * represents genes that are significant (q < 0.05 by dSNdScv)
when correction for multiple-hypothesis testing is applied across known driver
genes in lung cancers and normal squamous tissues (exact q values are
provided in Supplementary Table 4). c, Fraction of colonies with 0, 1, 2 or 3
driver mutations across the 16 subjects. d, Distribution of driver mutations
across colonies in the cohort, coloured by type of mutation. Loss of
heterozygosity (LOH) that affects driver mutations is also shown. e, Frequency
of driver mutations that are shared by more than one colony in a patient (dark
blue) versus those found in a single colony (light blue) across different cancer
genes.