nt12dreuar3esd

Analysis of clinical trial data
Bulk tumour RNA-seq data were analysed from three clinical trials:
(1) IMvigor210 (National Clinical Trials identifiers NCT02951767 and
NCT02108652)^18 , a single-arm trial studying locally advanced or meta-
static urothelial carcinoma and having 354 patients with bulk RNA-
seq data; (2) POPLAR (NCT01903993)^19 , a dual-arm trial comparing 93
patients with bulk RNA-seq data from non-small-cell lung carcinoma
treated with atezolizumab and 100 patients treated with the chemo-
therapeutic agent docetaxel; and (3) IMmotion150 (NCT01984242)^20 ,
a triple-arm trial of patients with renal cell carcinoma, treated with
atezolizumab (86 patients), atezolizumab plus the VEGF inhibitor
bevacizumab (88 patients), or the tyrosine kinase inhibitor sunitinib
(89 patients). RNA-seq counts were normalized by a trimmed mean of
M-values, log 2 -transformed, and converted to z-scores for each gene by
subtracting the means and dividing by the standard deviation across
all patients in the trial. Six patients in IMvigor210 had missing values
for PFS, and could not be included in survival analysis. Subsequent
clinical analysis was performed using the coxph function from the
survival package in R.
PFS hazard ratios were computed for each gene by dichotomizing
the expression of that gene based on the value being higher or lower
than the median, and then fitting a Cox proportional-hazards model
to the censored PFS survival times using the dichotomized variable.
To evaluate the overall set of hazard ratios for each gene signature, we
applied the gene set test proposed previously^53. Specifically, we first
confirmed from quantile plots that log-transformed hazard ratios
were normally distributed. We then found an intersection of genes
from each signature in a given clinical trial by matching the ENTREZ
gene IDs. The overall hazard ratio was computed as their geometric
mean, equivalent to an arithmetic mean of the log-transformed hazard
ratios. A z-statistic was computed as the mean of the log-transformed
hazard ratios multiplied by the square root of the number of intersect-
ing genes, and the P value was obtained by referring this value to the
normal distribution.
To evaluate combined signature scores, we found an intersection of
genes from each signature in a given clinical trial as before. The resulting
common genes in each signature were used to calculate an expression
score for each patient in each trial as a weighted mean z-score, using
mean t-statistics from the logistic regression models to weight each
gene. The signature score was dichotomized in each clinical trial based
on its value being higher or lower than the median. A Cox proportional-
hazards model of this dichotomized variable was fitted to the censored
PFS survival data in each arm of each clinical trial, and reported as the
hazard ratio and P value of the Wald statistic for each Kaplan–Meier
survival plot.
To compute interactions between each signature score and CD8A
expression, each score was dichotomized in each clinical trial based
on its value being higher or lower than the median. Patients were then
classified into one of four groups, depending on the combination of
these dichotomized variables. Survival analysis was performed on
these groups, with the hazard ratio and P values computed using the
dual-low group as the control. Interactions between the tumour mul-
tiplet and dual-expanded signatures were analysed analogously using
dichotomized values for these signature scores.

Statistical analyses
Aside from the clinical analyses described above, the following statisti-
cal tests were used in this study. (1) Chi-square tests were performed
using the chisq.test function in R. (2) Post hoc analyses of cells in a con-
tingency table for over-representation were performed as previously
described^56 in which a 2 × 2 table is created from the counts in a given
cell and the sum of remaining cells in the same row and column. For
each cell, a Fisher exact test was applied using the fisher.test function
in R with alt = “greater” to test the null hypothesis that the observed/

expected ratio ≤ 1, in which expected counts derive from the product of the row and column marginals. (3) Pearson’s correlation coefficients and associated two-sided P values were computed using the cor.test function in R to test the null hypothesis that the correlation coefficient is zero. (4) Tests of linear regression t-tests were performed using the lm function in R, followed by the summary function to obtain the two- sided P value of the null hypothesis that the regression coefficient is zero. In some cases, transformations of the variables were performed in addition to linear models, to test the null hypothesis that no rela- tionship exists between the variables. (5) t-tests were performed on log-transformed clone sizes using the t.test function in R to obtain the two-sided P value of the null hypothesis that the two groups have equal means. (6) One-sample z-tests were performed using the pnorm function in R to test the null hypothesis that values in a group have mean ≤ 0 (for the GSEA analysis in Extended Data Fig. 4g, with lower. tail set to FALSE) or mean ≥ 0 (for the hazard ratio analysis in Fig. 3a). The application of this test for GSEA was previously described^53. The value provided to pnorm was the mean of the values times the square root of the number of cases. For the GSEA analysis, in which biological insight was the main goal, no consideration was made for dependency among genes. For the hazard ratio analysis, we computed the correlation matrix R across the 30 genes based on the expression values across the patients in each clinical trial. The effective number of cases was given by the square root of 1TR1, where ‘1’ indicates a vector of ones. This procedure reduced the effective value of n by 20–25%. (7) Comparisons of correlation coefficients was performed using a Fisher’s z-test using the cocor function from the cocor package in R to test the null hypothesis that two correlation coefficients from separate samples are equal. No statistical methods were used to predetermine sample size. The experiments were not randomized unless otherwise stated. All figures of UMAP plots randomize the order in which cells are plotted. Investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability FASTQ files containing raw reads from the scRNA-seq and scTCR-seq analyses have been deposited with the European Genome-phenome Archive (EGA) under studies EGAS00001003993 and EGAS00001003994, and datasets EGAD00001005464 and EGAD00001005465. These files are available under controlled access upon request to the Data Access Com- mittee, with contact information provided at EGA (https://www.ebi. ac.uk/ega/home). Processed output files from Cell Ranger, integrated assay results from Seurat, and metadata with UMAP coordinates, cluster assignments, and clonotypes are available from the NCBI GEO under accession GSE139555.

Code availability Computer code used to generate the analyses and figures in this paper are provided as a as a Supplementary File to the NCBI Gene Expression Omnibus (GEO) accession GSE139555.

Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat.
Commun. 8 , 14049 (2017).

Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177 , 1888–1902 (2019).

Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a
transitional profibrotic macrophage. Nat. Immunol. 20 , 163–172 (2019).

Del Carratore, F. et al. RankProd 2.0: a refactored bioconductor package for detecting
differentially expressed features in molecular profiling datasets. Bioinformatics 33 , 2774–
2775 (2017).

Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing
and microarray studies. Nucleic Acids Res. 43 , e47 (2015).

nt12dreuar3esd

Get our desktop app

Company

Features

Documentation

Resources