Degasperiet al.,Science 376 , eabl9283 (2022) 22 April 2022 2 of 15
GEL commonICGC common
Hartwig common
GEL rareICGC rare
Hartwig rare
0
5
10
15
20
Number of common and rare SBS signatures
GELICGC
Hartwig
0 500 1000150020002500
0
5
10
15
20
Number of rare SBS signatures and samples
(Pearson corr. 0.76)
samples in each organ
GELICGC
Hartwig
cohort
GEL
ICGC
Hartwig
Total WGS samples
ICGC
3001
Hartwig 3417
12222 GEL
A
1000
2000
3000
samples
SNVs
DNVs
100
102
103
101
10104
1056
107
100
102
103
101
104
10105
1067
Skin Lung
Esophagus
StomachColorectalBladder
Liver
Head_neck
Oral_Oropharyngeal
UterusOvaryBiliaryKidney
Pancreas
BreastProstate
Bone_SoftTissue
CNS
Lymphoid
NET
Myeloid
GEL
Clustering on
mutational profiles
Exclusion of
atypical profiles
(12,222 WGS) (3,001 WGS)
ICGC Hartwig
(3,417 WGS)
Breast Ovary Stomach Uterus Breast Ovary Stomach Uterus Breast Ovary Stomach Uterus
Samples with
commonprofiles
All
samples
Extraction
of commonsignatures
Identify samplesthat contain
additional rare
signatures
Common
Organ-specific
Signatures
Rare
Organ-SpecificSignatures
Total SBS signatures
across organs (757)
135 GELcommon
180 GELrare
135 ICGCcommon
58 ICGCrare
135 Hartwigcommon
114 Hartwigrare
For each organin each cohort
0 500 1000150020002500
0
5
10
15
20
Number of common SBS signatures and samples
(Pearson corr. 0.14)
samples in each organ
GELICGC
Hartwig
B
C
D G
E
F
signatures in each organ
common signatures in each organ
rare signatures in each organ
Organ-specificsignatures
of all three
cohorts
(757)
PatternsDistinct
(187)
Clustering
to determine
Distinct Patterns
Recurrent
Distinct
Patterns(88)
Mixed
Distinct
Patterns(29)
Singleton
Distinct
Patterns(70)
Clustering into
RecurrentReference
Signatures (74)
Determine
composition
Referenceby other
Signatures
Variants of
Recurrent
Reference
Signatures
(24)
Known
signatures
from
other studies
(8)
Additional
previously
unreported
(38)
Reference
Signatures(120 SBS
signatures)
82 QC green SBS signatures 38 QC amber and red SBS signatures
Fig. 1. Whole-genomeÐsequenced cancers across three independent cohorts:
GEL, ICGC, and HMF.(A) WGS cases included in analyses. (B)Numberofsamples
and mutational burden of somatic single-nucleotide variants (SNVs) and double-
nucleotide variants (DNVs) across 21 whole-genomeÐsequenced tumor types from
the GEL, ICGC, and HMF cohorts. Not all tumor types (e.g., esophagus, head and
neck, oropharyngeal) are represented in all three cohorts. (C) Schematic
representation of the workflow of mutational signature analysis. Three cohorts (GEL,
ICGC, and HMF) were evaluated independently. For each organ in each cohort,
mutational catalogs were clustered and samples with atypical catalogs were
excluded from the extraction process. Samples with similar catalogs were subjected
to signature extraction to obtain a set of common organ-specific signatures. These
common signatures were fitted into all samples, highlighting samples with a high
error profile that were subsequently used to identify rare signatures. The pie
chart shows the total number of SBS signatures identified for each independent
extraction of each organ in all three cohorts. (D) Number of common and rare SBS
signatures in each cohort. (E) Common SBS signatures as a function of number
of samples analyzed. (F) Rare SBS signatures as a function of number of samples
analyzed. (G) Procedure to determine the reference signatures from all identified
cohort-organ signatures. Numbers refer to the SBS signature analysis. For details,
see materials and methods.
RESEARCH | RESEARCH ARTICLE