270 | Nature | Vol 578 | 13 February 2020
Article
and Sig-A. Phylogenetically, cells with a near-normal mutational burden
showed polyclonal origins (Fig. 3a, Extended Data Fig. 6), suggesting
that they do not arise from the expansion of a single ancestral cell.
Signatures of indels and double-base substitutions that were
observed in normal bronchial epithelium matched those extracted
from lung cancers^24 and those generated in vitro by exposure of cells
to polycyclic aromatic hydrocarbons^19 (Extended Data Figs. 7, 8). A his-
tory of tobacco smoking was particularly associated with a signature of
double-base substitutions at CpC (equivalently GpG) dinucleotides—a
finding that is in accordance with the high rates of C>A (G>T) single-
base substitutions in SBS-4. Similarly, tobacco exposure was associated
with an indel signature of single-base deletions of cytosines (guanines)
in our dataset. Together, these data suggest that the propensity of
polycyclic aromatic hydrocarbons in tobacco smoke to bind guanine
nucleotides can result in a range of mutation types even in normal bron-
chial epithelial cells, including single-base substitutions, dinucleotide
substitutions and small indels.Driver mutations
To assess whether any mutations are under positive selection in normal
bronchial epithelium, we applied an algorithm, dNdScv, which identi-
fies and quantifies the number of excess non-synonymous mutations
compared with the number expected from the rate of synonymousTP53
NOTCH1
FAT1
CHEK2
PTEN
ARID1A
ARID2
IDH1
EP300
CREBBP
PIK3CAb cd*e0 10 20 20No. of
colonies
30 051015No. of
unique mutationsNonsense
Missense
Splice
Synonymous
Multiple
Frameshift**
*
******00.250.500.751.00TP53 NOTCH1Other All
mutationsFrequencyShared Single colonydriversChild Never-smokerEx-smoker SmokerPD3745511 m
PD374561 y
PD374533 y
PD3745459 y
PD3421573 y
PD3420975 y
PD3745180 y
PD3420554 y
PD3421068 y
PD3016071 y
PD3745275 y
PD3420676 y
PD2698881 y
PD3420461 y
PD3420765 y
PD3421174 y00.250.500.751.000 drivers 1 driver 2 drivers 3 driversPatient
Smoking
LOH
TP53
NOTCH1
FAT1
CHEK2
ARID1A
PTEN
ARID2
IDH1
EP300
PIK3CA
CREBBP
Current smoker
Ex-smoker
Never-smokerLOH
Missense
Nonsense
FrameshiftSplice site
Synonymous
MultipleNo. of substitutions No. of substitutions No. of substitutionsa PD34215, F, 73 y
Never-smoker0 500 1,0001,5002,000ARID2FAT1NOTCH1, IDH1NOTCH1PD34206, M, 76 y
Ex-smoker0 2,000 4,0006,000 8,000ARID2ARID1ANOTCH1PIK3CAPD34211, M, 74 y
Current smoker0 4,000 8,000 12,000ARID1A
ARID1A, FAT1ARID1A, FAT1, NOTCH1PTENTP53TP53FAT1ARID1ACREBBP
SBS-1
Sig-A
SBS-5
SBS-2
SBS-13
SBS-18
SBS-16
SBS-4
Sig-B
UnallocatedFig. 3 | Driver mutations in normal bronchial epithelial cells. a, Phylogenetic
trees showing clonal relationships among normal bronchial cells in three
representative subjects. Branch lengths are proportional to the number of
mutations (x axis) specific to that clone or subclone. Each branch is coloured by
the proportion of mutations on that branch that are attributed to the various
SBS signatures. The driver mutations that were identified in each branch are
also shown (black, SBS; red, indel). b, Total number of colonies with mutations
(left) and number of unique mutations (right) in key cancer genes across the
sample set (n = 632). ** represents genes that are significant (q < 0.05 by
dNdScv) when correction for multiple-hypothesis testing is applied across all
coding genes; * represents genes that are significant (q < 0.05 by dSNdScv)
when correction for multiple-hypothesis testing is applied across known driver
genes in lung cancers and normal squamous tissues (exact q values are
provided in Supplementary Table 4). c, Fraction of colonies with 0, 1, 2 or 3
driver mutations across the 16 subjects. d, Distribution of driver mutations
across colonies in the cohort, coloured by type of mutation. Loss of
heterozygosity (LOH) that affects driver mutations is also shown. e, Frequency
of driver mutations that are shared by more than one colony in a patient (dark
blue) versus those found in a single colony (light blue) across different cancer
genes.