nt12dreuar3esd

Article

Analysis of data from Yost et al.^5
scRNA-seq counts and single-cell TCR sequences were obtained using
the GEOquery library in R and running the command getGEOSupp-
Files for GSE123813. Metadata consisting of the patient, treatment
status, and cluster assignment for each cell were obtained from the
corresponding authors for Yost et al.^5. We analysed expression data
that had matching metadata, resulting in 33,106 cells measured on
23,309 genes. Single-cell TCR sequences were available on 28,371
cells with matching metadata. Bulk TCR-seq sequences and template
counts were obtained from the ImmuneACCESS Web portal by Adap-
tive Biotechnologies (https://clients.adaptivebiotech.com/pub/yost-
2019-natmed).
The scTCR-seq data were provided as TCR-α and TCR-β nucleotide
and amino acid sequences without assigned clonotypes. We therefore
created clonotype groupings from pre- and post-treatment tumour
samples by matching all available TCR chains between cells of a given
patient, using a procedure analogous to our re-grouping of tumour,
NAT and blood clonotypes for our own dataset. Among the 11 patients
in the dataset, this procedure identified 17,369 distinct clonotypes.
Patient bcc.su003 had no clonotype matches between pre-treatment
and post-treatment tumour samples, and could not be analysed further.
The remaining patients had pre-treatment and post-treatment bulk
TCR-seq data from blood for five and six patients, respectively, with
patient bcc.su002 having no pre-treatment blood sample.
Bulk TCR-seq clonotypes were linked to tumour clonotypes by match-
ing the single TCR β-chain to any of the TCR β-chains from a clonotype
in the scTCR-seq data (Extended Data Fig. 7a). The bulk TCR-seq data
reports an 87-bp rearrangement sequence from each TCR β−chain that
includes the CDR3 sequence, so the scRNA-seq CDR3 (sc-CDR3-nt) was
required to match exactly at the nucleotide level to the corresponding
substring (bulk-CDR3-nt) in the bulk TCR-seq rearrangement. This
alignment process was facilitated by the fact that the immunoSEQ
output includes the amino acid CDR3 (bulk-CDR3-aa) sequence, which
we used to identify the corresponding bulk-CDR3-nt. We could then
create a hash table of all sc-CDR3-nt sequences and use that for lookup
from a given bulk-CDR3-nt sequence.
Because scTCR-seq clonotypes can contain TCR α-chains as well
as multiple TCR β-chains, the potential exists for a many-to-one cor-
respondence between a scTCR-seq clonotype from tumour and a bulk
TCR-seq clonotype from blood. For analyses requiring clones to be
classified by the presence or absence of a bulk TCR-seq match (that is,
blood association versus blood independence) (Fig. 3c), all matching
clonotypes were considered to have an association with blood. For
analyses requiring matching clone sizes between tumour and blood
(Fig. 3a, Extended Data Fig. 7b, c), multiple tumour clonotype matches
for a given blood clonotype were combined by summing the scTCR-seq
clone sizes and resolving multiple primary cell clusters by a majority
vote or leaving the primary cell cluster unassigned in case of a tie. For
the analysis comparing pre- and post-treatment bulk TCR sequences
(Extended Data Fig. 7d), clonotypes were matched when the entire
87-bp rearrangement sequences were identical.
Because bulk TCR-seq measures transcripts rather than cells, we
could not discriminate blood-expanded from non-expanded clones,
although we could distinguish between blood-independent and blood-
associated clones by the absence or presence of a matching transcript.
We assumed that if the number of transcripts per cell is relatively
uniform, the number of transcripts was approximately proportional
to the number of cells. Therefore, instead of measuring peripheral
clonal expansion, we measured blood clonal diversity using a Shan-
non entropy measure, the sum of p × log 2 (p), over all transcripts with
a count of 50 or more. This threshold was implemented because of the
heavily skewed distribution of transcript counts, with large numbers
of singleton transcripts that potentially introduced noise into the
calculations.

Completeness curves for Yost et al.^5 We used the transcripts field from each immunoSEQ analysis file for bulk TCR-seq from a pre- or post-treatment blood sample as input to the iNEXT procedure in the iNEXT package in R, with ‘abundance’ as the datatype parameter. The resulting interpolated, observed, and extrapolated values were then plotted using the m and SC variables in the iNextEst object for each sample (Extended Data Fig. 7e).

Comparison of TCR sequences with VDJdb We obtained the June 2018 (2018-06-04) release of VDJdb^54 (https:// github.com/antigenomics/vdjdb-db/releases). We analysed the file vdjdb.slim.txt to extract entries where the TCR species was ‘Homo- Sapiens’ and the epitope species was ‘CMV’, ‘EBV’ or ‘InfluenzaA’. We compared each clonotype from our data against the cdr3 field in the VDJdb database, requiring an exact match from any of the CDR3 amino acid sequences for the clonotype.

Comparison of TCR sequences with TCGA We obtained a set of 716,720 TCR sequences assembled from 9,142 bulk RNA-seq samples in TCGA using the TRUST algorithm^55. We obtained those sequences from the authors after gaining authorized access to sequence-level data in TCGA. Each sequence was labelled with a puta- tive CDR3 amino acid sequence in the header of the FASTA file. We compared each clonotype from our data, requiring an exact match from any of the CDR3 amino acid sequences for the clonotype against any substring of the TCGA-derived CDR3 sequence. Each FASTA file was annotated with an aliquot ID, which we linked to a cancer type using the R library GenomicDataCommons. We considered a TCGA-derived CDR3 sequence to be virally reactive when it was observed in three or more individuals, all with different cancer types, and no two individuals had the same cancer type.

Gene signatures for tissue expansion patterns We applied a logistic regression procedure similar to that used for finding characteristic biomarkers of T cell clusters, except that we classified cells in tumour tissue on the basis of their parent clones being tumour singletons, tumour multiplets, or dual-expanded clones. The logistic regression procedure was performed on each gene within each patient for each tissue expansion pattern using the scRNA-seq assay data to test all available genes. The library size was included as a covariate to help remove artefacts related to sequencing depth. Each logistic regression fit yielded a t-statistic for a given gene in a given tissue expansion pattern. For each tissue expansion pattern, t-statistics were combined across patients using the Rank Product procedure in the RankProd package in R, and using the resulting one-sided q-values for ranking genes. A total of 30 genes were selected for each gene expression signature. Because gene signatures are intended for use in bulk RNA-seq data from tumours, which contain non-T immune cells and non-immune cells, we considered the problem that our genes were selected within a restricted universe of T cells and that expression of our genes by non-T cells could introduce noise. We therefore examined the co-expression patterns of our gene signatures in bulk RNA-seq data from pre-treatment tumours, obtained from three clinical trials, as discussed below (Extended Data Fig. 9a). We computed the Pearson’s correlation coefficient of z-scores for each pair of genes in each signature across patients in all three trials, and performed hierarchical clustering of the correlation matrix. We cut the hierarchical clustering at the first level and used the larger of the two groups for the gene signatures. To compute a composite score, we used the t-statistics from the logistic regression signatures as weights. However, the relatively high degree of correlation of gene expression meant that weighting had relatively minor effects, and we obtained similar results by giving equal weighting to genes in our gene signatures.

nt12dreuar3esd

Get our desktop app

Company

Features

Documentation

Resources