Science - USA (2022-04-22)

(Maropa) #1

RESEARCH ARTICLE



CANCER GENOMICS


Substitution mutational signatures in


whole-genomeÐsequenced cancers in the UK population


Andrea Degasperi1,2, Xueqing Zou1,2, Tauanne Dias Amarante1,2, Andrea Martinez-Martinez1,2,
Gene Ching Chiek Koh1,2,JoãoM.L.Dias1,2, Laura Heskin1,2, Lucia Chmelova1,2, Giuseppe Rinaldi1,2,
Valerie Ya Wen Wang1,2, Arjun S. Nanda1,2, Aaron Bernstein1,2, Sophie E. Momen1,2, Jamie Young1,2,
Daniel Perez-Gil1,2, Yasin Memari1,2, Cherif Badja1,2, Scott Shooter1,2, Jan Czarnecki1,2, Matthew A. Brown3,4,
Helen R. Davies1,2, Genomics England Research Consortium^3 , Serena Nik-Zainal1,2*


Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational
signatures, imprints of DNA damage, and repair processes that have arisen in each patient’s cancer.
We performed mutational signature analyses on 12,222 whole-genome–sequenced tumor-normal
matched pairs from patients recruited via the UK National Health Service (NHS). We contrasted our
results with two independent cancer WGS datasets—from the International Cancer Genome Consortium
(ICGC) and the Hartwig Medical Foundation (HMF)—involving 18,640 whole-genome–sequenced cancers
in total. Our analyses add 40 single and 18 double substitution signatures to the current mutational
signature tally. We show for each organ that cancers have a limited number of common signatures and a
long tail of rare signatures, and we provide a practical solution for applying this concept of common
versus rare signatures to future analyses.


T


he global cancer burden was estimated
at 19.3 million new cases and 10.0 mil-
lion deaths in 2020 ( 1 ). Worldwide, can-
cer is the first or second leading cause
of mortality before the age of 70 ( 1 ). The
genome of a cancer is a highly distorted entity
that has acquired thousands of genetic aberra-
tions since conception ( 2 ). If examined com-
prehensively, cancer genomes can thus reveal
insights into carcinogenesis ( 2 ).
Today, modern sequencing technologies
have augmented the scale and rapidity of ge-
nome resequencing ( 3 ), permitting whole-
genome sequencing (WGS) approaches that
provide an all-inclusive perspective on cancer
genomes ( 4 ). Beyond the handful of causative
driver mutations, WGS allows exploration of
the full landscape of passenger mutations that
describe the processes that have arisen during
tumorigenesis, resulting in patterns known as
mutational signatures ( 5 – 7 ). Whereas drivers
become important targets for therapeutic in-
tervention, mutational signatures provide clues
regarding historical environmental exposures
and highlight potentially targetable pathway
defects ( 4 , 6 , 8 , 9 ).
Substantial efforts by The Cancer Genome
Atlas (TCGA) ( 10 ), the International Cancer


Genome Consortium (ICGC) ( 9 , 11 ), and the
Hartwig Medical Foundation (HMF) ( 12 ) have
helped advance cancer genomics considerably
in recent years. However, an endeavor to gen-
erate whole cancer genomes from national
public health cancer services would be a wel-
come demonstration of how cancer genomic
data can be derived in real time and ultimately
benefit patients and the scientific community.
We examined a cohort of 15,838 whole-
genome–sequenced cancers from patients re-
cruited from all 13 UK National Health Service
(NHS) Genomic Medicine Centres across
England as part of the Genomics England
(GEL) 100,000 Genomes Project (100kGP)
( 7 , 13 ) (GEL v8 data release). We report the
analysis of mutational signatures and high-
light a conceptual advance achieved via the
examination of this substantial WGS collec-
tion. We add 40 single-base substitution (SBS)
mutational signatures and 18 double-base sub-
stitution (DBS) mutational signatures to the
current tally. We compare these additional sig-
natures to known etiologies and end by sug-
gesting principles of how to meaningfully use
mutational signatures in future analyses.

The GEL cohort
All 15,838 tumor-normal sample pairs were
taken through 100kGP bioinformatic somatic-
variant analysis pipelines. We restricted our
analysis to high-quality data derived from
flash-frozen material, involving 12,222 GEL
tumor samples from 11,585 individuals (sev-
eral participants had synchronous or metach-
ronous tumors). For this evaluation, the final
dataset included a total of 298,694,545 substitu-

tions, 2,675,617 double substitutions, 154,675,475
indels, and 1,958,105 rearrangements (Fig. 1,
A and B, and tables S1 and S2) of 19 tumor
types [skin, lung, stomach, colorectal, blad-
der, liver, uterus, ovary, biliary, kidney, pan-
creas, breast, prostate, bone and soft tissue,
central nervous system (CNS), lymphoid, oro-
pharyngeal, neuroendocrine tumors (NETs),
and myeloid].

Common and rare mutational signatures
The national GEL sequencing endeavor de-
livers thousands of samples for certain tumor
types (1009 lung, 1355 kidney, 2572 breast, and
1480 bone and soft-tissue cancers), one or two
orders of magnitude greater than the number
of samples generated from previous WGS ef-
forts for some organs. This permits robust de-
tection of signatures that are rare—i.e., those
occurring in≤1% of the tumors. Furthermore,
already-sequenced WGS cohorts, such as ~3000
primary cancers from ICGC and ~3400 meta-
static cancers from HMF, provide a powerful
means of validating findings.
We performed mutational signature extrac-
tions confined to specific tumor types by means
of an updated signature extraction method
(Fig. 1C, fig. S1, tables S3 to S6, and materials
and methods). Briefly, for each tumor type, we
clustered mutational catalogs (counts of SBSs
in 96-element form or DBSs in 78-element
form), selecting only samples with recurrent,
commonly occurring profiles to perform sig-
nature extraction (fig. S1, A to C). Cases with
unusual profiles and likely to have rare signa-
tures were excluded in the first extraction,
thus yielding a set of highly accurate common
signatures that are prevalent for a given tumor
type. Next, by fitting these common signatures
into all samples, cases that are likely to have
additional patterns not fully explained by com-
mon signatures alone would report a high error
(or discrepancy between true sample catalog
and reconstructed catalog) (fig. S1D). Potential
additional signatures were then extracted from
these samples to obtain a set of rare signatures
(fig. S1, E to H, and materials and methods).
Accordingly, we obtained a set of common sig-
natures and a set of rare signatures for each
tumor type. In all, for SBSs, we identified 135
common signatures and 180 rare signatures in
19 tumor types within the GEL cancer cohort.
To validate these common and rare signa-
tures, we performed signature extractions in
independent cohorts of 3001 ICGC primary
whole-genome–sequenced cancers (19 tumor
types) and 3417 metastatic HMF WGS samples
(18 tumor types). We identified 135 common
and 58 rare signatures in the ICGC cohort and
135 common and 114 rare signatures in the
HMF cohort (tables S7 to S10). We performed
an agnostic three-way signature comparison
in 16 tissue types that were present in all three
cohorts (fig. S2) and found that signatures

RESEARCH


Degasperiet al.,Science 376 , eabl9283 (2022) 22 April 2022 1of15


(^1) Academic Department of Medical Genetics, School of
Clinical Medicine, University of Cambridge, Cambridge
CB2 0QQ, UK.^2 Early Cancer Institute, University of
Cambridge, Cambridge CB2 0XZ, UK.^3 Genomics England,
Queen Mary University of London, Dawson Hall, Charterhouse
Square, London EC1M 6BQ, UK.^4 Faculty of Life Sciences and
Medicine, King’s College London, London SE1 9RT, UK.
*Corresponding author. Email: [email protected]
†Genomics England Research Consortium members and affiliations
are listed at the end of this paper.

Free download pdf