Nature - USA (2020-02-13)

(Antfer) #1

Methods


Data reporting
No statistical methods were used to predetermine sample size. The
experiments were not randomized and the investigators were not
blinded to allocation during experiments and outcome assessment.


Subjects
Subjects were recruited at University College London Hospitals (UCLH)
or Great Ormond Street Hospital (GOSH) and gave written informed
consent with approval of the Research Ethics Committee (REC refer-
ence 06/Q0505/12 and 11/LO/152, respectively). Details of the patients
studied are listed in Supplementary Table 1. All patients underwent
bronchoscopy as part of their clinical care. In adults, the bronchoscopy
procedure was performed for diagnostic or surveillance indications;
in children, it was undertaken for investigational procedures on con-
genital tracheal abnormalities. For five patients with squamous cell
carcinomas or carcinoma in situ, biopsy of normal bronchial tissue
was taken from a site distant from the tumour.


Single-cell-derived colonies
Endobronchial biopsies were dissociated using 16 U/ml dispase in RPMI
for 20 min at room temperature. The epithelium was dissected away
from the underlying stroma and fetal bovine serum (FBS) was added
to a final concentration of 10%. Both the epithelium and stroma were
combined and digested in 0.1% trypsin/EDTA at 37 °C for 30 min. The
solution was neutralized with FBS to a final concentration of 10% and
added to the neutralized dispase solution^36. Cells were passed through
a 100-μm cell strainer and stained in sorting buffer (1× PBS, 1% FBS,
25 mM HEPES and 1 mM EDTA) with anti-CD45-PE (BD Pharminogen
555483, 1:200), anti-CD31-PE (BD Pharminogen 555446, 1:200), anti-
EPCAM-APC (Biolegend 324208, 1:50) antibodies and DAPI (1 μg/ml). For
endobronchial brushings, no dissociation was carried out and the cell
suspension was passed through a 100-μm cell strainer before staining.
Cells were single-cell sorted on the basis of their expression
of CD45, CD31 and EpCAM, using a BD FACSAria Fusion. Each
DAPI−CD45−CD31−EpCAM+ cell was sorted into 1 well of a 96-well plate,
pre-coated with collagen I and mitotically inactivated 3T3-J2 feeder
cells. Feeder cells were authenticated by whole-genome sequencing,
and were screened for mycoplasma contamination by PCR. Cells were
grown in fresh epithelial growth medium^37 (Dulbecco’s modified Eagle
medium (DMEM):F12 at a 3:1 ratio with penicillin–streptomycin, 5% FBS,
5 μM Y-27632, 5 μg/ml insulin, 25 ng/ml hydrocortisone, 0.125 ng/ml
epidermal growth factor, 0.1 nM cholera toxin, 250 ng/ml amphotericin
B and 10 μg/ml gentamicin), which was supplemented for the first week
of culture with epithelial growth medium that had been conditioned
on growing epithelial cells and a final concentration of 10 μM Y-27632.
Epithelial cells were grown in 96-well plates for 2 weeks before being
passaged into 24-well plates and then into T25 flasks. Epithelial cells
were in culture for a total of about 25 days at 37 °C and 5% CO 2 with 3
changes of medium per week. When cells reached 70–80% confluence
in T25 flasks, they were differentially trypsinized (making use of the
greater sensitivity of feeder cells to trypsin compared with epithelial
cells), generating a mostly pure population of epithelial cells. DNA was
then extracted using the PureLink Genomic DNA Mini Kit (Invitrogen).


Whole-genome sequencing
Paired-end sequencing reads (150 bp) were generated using the Illumina
Hiseq X-Ten platform for 662 samples from 16 patients. The target cov-
erage was 15× per sample, except for 30× for 26 pilot samples that were
derived from the first patient (PD26988). For ten patients, blood DNA
samples were also sequenced as germline controls. For three patients,
samples of bulk squamous cell carcinoma or carcinoma in situ, which
were collected at the same or close time points (around four months
after), were sequenced, including two samples of carcinoma in situ


that were used in a previous study^38 (PD38326a and PD38327a, which
are carcinomas in situ that were derived from PD30160 and PD34210,
respectively). We also sequenced the whole genome of the pure mouse
feeder cell layer.

Discrimination of human and mouse sequences
Bronchial epithelium samples were cultured on J2 mouse embryonic
feeder fibroblast cells, which caused various degrees of contamination
of mouse DNA in the samples from bronchial cell colonies. To remove
mouse-derived sequencing reads, we used the Xenome algorithm^39
with default setting (k-mer size = 25). The Xenome algorithm classifies
fastq files into five categories: graft (human), host (mouse), ambiguous,
both and neither. We confirmed that most of the sequencing reads of a
sample of pure human DNA were classified as human (98%) and those
of a sample of DNA derived from mouse feeder cells were rarely (2.8%)
classified as human (Extended Data Fig. 2a). In addition, we mapped
sequencing reads of a DNA sample from mouse feeder fibroblasts to
the human reference genome, and confirmed that most of the mouse-
derived mutations had been successfully removed using Xenome for
selected samples with mouse contamination (Extended Data Fig. 2b).
Although all samples were negative for mycoplasma using standard
laboratory PCR testing, Xenome identified sequencing reads derived
from the mycoplasma genome in a subset of samples, and assigned
them to the ‘neither’ classification.
With testing complete, we ran Xenome for all bronchial epithelium
samples, and aligned only reads that were classified as human to the
human reference genome (NCBI build 37d5) using the BWA-MEM algo-
rithm. The metrics of sequencing coverage and proportion of human-
derived reads are listed in Supplementary Table 2, and 20 samples
with an average sequencing depth of less than 8× were excluded from
further analysis owing to lower estimated sensitivity, as described later
(Extended Data Fig. 2e).

Clonality of samples
To ensure that each sample was single-cell-derived, we visually
inspected the distribution of VAFs of mutations: 632 clones had VAFs
distributed around 50%, confirming that they were derived from a sin-
gle cell, but 10 clones had lower allele fractions, suggesting that these
colonies were oligoclonal (Extended Data Fig. 2d). These samples were
removed from further analyses (Supplementary Table 2).

Single-base-substitution calling
Single-base substitutions were called using the Cancer Variants through
Expectation Maximization (CaVEMan) algorithm^40 with copy-number
options of major copy number 5, minor copy number 2 and normal con-
tamination 0.1. To allow the discovery of early embryonic mutations,
we ran CaVEMan using an unmatched normal control. In addition to the
default ‘PASS’ filter, we removed variants with a median alignment score
(ASMD) < 120 and those with a clipping index (CLPM) > 0, to remove
mapping artefacts. Variants identified in the mouse feeder fibroblast
DNA sample were also removed, if they persisted in the call-set. Subse-
quently, for every mutation identified in any colonies from each patient,
we counted the number of mutant and wild-type reads in all bronchial
samples from the same patient using the bam2R function of the R pack-
age deepSNV^41 , for which bases with ≥30 base quality and sequencing
reads with ≥30 mapping quality were used. Further filters described
below were applied to identify true somatic mutations and separate
them from either germline variants or recurrent sequencing errors.

Removing germline variants (binomial filter)
We fitted a binomial distribution to the total variant counts and total
depth at each single-base substitution site across all samples from one
patient. To differentiate somatic variants from germline variants, we
used a one-sided exact binomial test, with the null hypothesis that
these variants were drawn from a binomial distribution with a success
Free download pdf