RESEARCH ARTICLE SUMMARY
◥
CANCER GENOMICS
Substitution mutational signatures in
whole-genomeÐsequenced cancers in the UK population
Andrea Degasperi, Xueqing Zou, Tauanne Dias Amarante, Andrea Martinez-Martinez, Gene Ching Chiek Koh,
João M. L. Dias, Laura Heskin, Lucia Chmelova, Giuseppe Rinaldi, Valerie Ya Wen Wang, Arjun S. Nanda,
Aaron Bernstein, Sophie E. Momen, Jamie Young, Daniel Perez-Gil, Yasin Memari, Cherif Badja, Scott Shooter,
Jan Czarnecki, Matthew A. Brown, Helen R. Davies, Genomics England Research Consortium, Serena Nik-Zainal*
INTRODUCTION:Mutational signatures—imprints
of DNA damage and repair processes that have
been operative during tumorigenesis—provide
insights into environmental and endogenous
causes of each patient’s cancer. Cancer genome
sequencing studies permit exploration of mu-
tational signatures. We investigated a very large
number of whole-genome–sequenced cancers
of many tumor types, substantially more than
in previous efforts, to comprehensively re-
inforce our understanding of mutational
signatures.
RATIONALE:We present mutational signature
analyses of 12,222 whole-genome–sequenced
cancers collected prospectively via the UK
National Health Service (NHS) for the 100,000
Genomes Project. We identified single-base
substitution (SBS) and double-base substitu-
tion (DBS) signatures independently in each
organ. Exploiting this unusually large cohort,
we developed a method to enhance discrimi-
nation of common mutational processes from
rare, lower-frequency mutagenic processes. We
validated our findings by independently per-
forming analyses with data from two publicly
available cohorts: 3001 primary cancers from
the International Cancer Genome Consortium
(ICGC) and 3417 metastatic cancers from the
Hartwig Medical Foundation. We produced a
set of reference signatures by comparing and
contrasting the independently derived tissue-
specific signatures and performing clustering
analysis to unite mutational signatures from
different tissues that could be due to similar
processes. We included additional quality con-
trol measures such as dimensionality reduction
of mixed signatures and gathered evidence that
could help elucidate mechanisms and etiol-
ogies such as transcriptional and replication
strand bias, associations with somatic drivers,
and germline predisposition mutations. We also
investigated additional mutation context and
examined past clinical and treatment histories
when possible, to explore potential etiologies.
RESULTS:Each organ contained a limited num-
ber of common SBS signatures (typically be-
tween 5 and 10). The number of common
signatures was independent of cohort size. By
contrast, the number of rare signatures was
dependent on sample size, as the likelihood of
detecting a rare signature is a function of its
population prevalence. The same biological
process produced slightly different signatures
in diverse tissues, reinforcing that mutational
signatures are tissue specific.
Across organs, we clustered all tissue-specific
signatures to ascertain mutational processes
that were equivalent but occurring in different
tissues (i.e., reference signatures). We obtained
82 high-confidence SBS reference signatures
and 27 high-confidence DBS reference signa-
tures. We compared these with previously re-
ported mutational signatures, revealing 40 and
18 previously unidentified SBS and DBS signa-
tures, respectively.
Because we are cognizant of increasing com-
plexity in mutational signatures and want to
enable general users, we developed an algo-
rithm called Signature Fit Multi-Step (FitMS)
that seeks signatures in new samples while
taking advantage of our recent findings. In a
first step, FitMS detects common, organ-
specific signatures; in a second step, it deter-
mines whether an additional rare signature is
also present.
CONCLUSION:Mutational signature analysis
of 18,640 cancers, the largest cohort of whole-
genome–sequenced samples to date, has re-
quired methodological advances, permitting
knowledge expansion. We have identified
many previously unreported signatures and
established the concept of common and rare
signatures. The FitMS algorithm has been
designed to exploit these advances to aid users
in accurately identifying mutational processes
in new samples.
▪
RESEARCH
368 22 APRIL 2022•VOL 376 ISSUE 6591 science.orgSCIENCE
The list of author affiliations is available in the full article online.
*Corresponding author. Email: [email protected]
Cite this article as A. Degasperiet al.,Science 376 ,
eabl9283 (2022). DOI: 10.1126/science.abl9283
READ THE FULL ARTICLE AT
https://doi.org/10.1126/science.abl9283
Analysis of 12,222 WGS cancers
from UK NHS (GEL)
Tissue-
type A
Tissue-
type B
Common signatures
Rare signatures
Relate samples across cohorts and
tissue-types using Reference Signatures
Common signatures
Rare signatures
19
tissue-types
Independent analysis of public WGS cancer cohorts
ICGC 3,001 WGS 19 tissue-types
Hartwig 3,417 WGS 18 tissue-types
Number of samples Number of samples
Number of signatures Number of signatures
Common signatures Rare signatures
Step 1
Fit common
signatures
Investigate new samples using FitMS
?
More rare signatures were discovered
in larger cohorts
Step 2
Attempt to find additional
rare signatures
GEL ICGC Hartwig
0.00
0.04
0.08
C>A C>G C>T T>A T>C T>G
Example SBS116
SBS116 found in 3 samples
GEL: 1 breast, 1 ovary
Hartwig: 1 prostate
ICGC
0
GEL
2
Hartwig 1
Tissue-type A
Tissue-type C
Common signatures
Rare signatures
Common signatures
Rare signatures
Tissue-type B
Tissue-type A
Common signatures
Rare signatures
Common signatures
Rare signatures
Discovery and application of common and rare mutational signatures.Analysis of three large whole-genomeÐ
sequenced cancer cohorts revealed that per-organ common signatures are limited in number, whereas numbers
of rare signatures increase with increasing cohort size. Reference signatures permit comparisons across organs and
cohorts. Henceforth, a new algorithm, FitMS, which accounts for common and rare signatures, can be used to
analyze new samples. GEL, Genomics England cohort.