nt12dreuar3esd

(Sean Pound) #1

Analysis of clinical trial data
Bulk tumour RNA-seq data were analysed from three clinical trials:
(1) IMvigor210 (National Clinical Trials identifiers NCT02951767 and
NCT02108652)^18 , a single-arm trial studying locally advanced or meta-
static urothelial carcinoma and having 354 patients with bulk RNA-
seq data; (2) POPLAR (NCT01903993)^19 , a dual-arm trial comparing 93
patients with bulk RNA-seq data from non-small-cell lung carcinoma
treated with atezolizumab and 100 patients treated with the chemo-
therapeutic agent docetaxel; and (3) IMmotion150 (NCT01984242)^20 ,
a triple-arm trial of patients with renal cell carcinoma, treated with
atezolizumab (86 patients), atezolizumab plus the VEGF inhibitor
bevacizumab (88 patients), or the tyrosine kinase inhibitor sunitinib
(89 patients). RNA-seq counts were normalized by a trimmed mean of
M-values, log 2 -transformed, and converted to z-scores for each gene by
subtracting the means and dividing by the standard deviation across
all patients in the trial. Six patients in IMvigor210 had missing values
for PFS, and could not be included in survival analysis. Subsequent
clinical analysis was performed using the coxph function from the
survival package in R.
PFS hazard ratios were computed for each gene by dichotomizing
the expression of that gene based on the value being higher or lower
than the median, and then fitting a Cox proportional-hazards model
to the censored PFS survival times using the dichotomized variable.
To evaluate the overall set of hazard ratios for each gene signature, we
applied the gene set test proposed previously^53. Specifically, we first
confirmed from quantile plots that log-transformed hazard ratios
were normally distributed. We then found an intersection of genes
from each signature in a given clinical trial by matching the ENTREZ
gene IDs. The overall hazard ratio was computed as their geometric
mean, equivalent to an arithmetic mean of the log-transformed hazard
ratios. A z-statistic was computed as the mean of the log-transformed
hazard ratios multiplied by the square root of the number of intersect-
ing genes, and the P value was obtained by referring this value to the
normal distribution.
To evaluate combined signature scores, we found an intersection of
genes from each signature in a given clinical trial as before. The resulting
common genes in each signature were used to calculate an expression
score for each patient in each trial as a weighted mean z-score, using
mean t-statistics from the logistic regression models to weight each
gene. The signature score was dichotomized in each clinical trial based
on its value being higher or lower than the median. A Cox proportional-
hazards model of this dichotomized variable was fitted to the censored
PFS survival data in each arm of each clinical trial, and reported as the
hazard ratio and P value of the Wald statistic for each Kaplan–Meier
survival plot.
To compute interactions between each signature score and CD8A
expression, each score was dichotomized in each clinical trial based
on its value being higher or lower than the median. Patients were then
classified into one of four groups, depending on the combination of
these dichotomized variables. Survival analysis was performed on
these groups, with the hazard ratio and P values computed using the
dual-low group as the control. Interactions between the tumour mul-
tiplet and dual-expanded signatures were analysed analogously using
dichotomized values for these signature scores.


Statistical analyses
Aside from the clinical analyses described above, the following statisti-
cal tests were used in this study. (1) Chi-square tests were performed
using the chisq.test function in R. (2) Post hoc analyses of cells in a con-
tingency table for over-representation were performed as previously
described^56 in which a 2 × 2 table is created from the counts in a given
cell and the sum of remaining cells in the same row and column. For
each cell, a Fisher exact test was applied using the fisher.test function
in R with alt = “greater” to test the null hypothesis that the observed/


expected ratio ≤ 1, in which expected counts derive from the product
of the row and column marginals. (3) Pearson’s correlation coefficients
and associated two-sided P values were computed using the cor.test
function in R to test the null hypothesis that the correlation coefficient
is zero. (4) Tests of linear regression t-tests were performed using the
lm function in R, followed by the summary function to obtain the two-
sided P value of the null hypothesis that the regression coefficient is
zero. In some cases, transformations of the variables were performed
in addition to linear models, to test the null hypothesis that no rela-
tionship exists between the variables. (5) t-tests were performed on
log-transformed clone sizes using the t.test function in R to obtain
the two-sided P value of the null hypothesis that the two groups have
equal means. (6) One-sample z-tests were performed using the pnorm
function in R to test the null hypothesis that values in a group have
mean ≤ 0 (for the GSEA analysis in Extended Data Fig. 4g, with lower.
tail set to FALSE) or mean ≥ 0 (for the hazard ratio analysis in Fig. 3a).
The application of this test for GSEA was previously described^53. The
value provided to pnorm was the mean of the values times the square
root of the number of cases. For the GSEA analysis, in which biological
insight was the main goal, no consideration was made for dependency
among genes. For the hazard ratio analysis, we computed the correla-
tion matrix R across the 30 genes based on the expression values across
the patients in each clinical trial. The effective number of cases was
given by the square root of 1TR1, where ‘1’ indicates a vector of ones. This
procedure reduced the effective value of n by 20–25%. (7) Comparisons
of correlation coefficients was performed using a Fisher’s z-test using
the cocor function from the cocor package in R to test the null hypoth-
esis that two correlation coefficients from separate samples are equal.
No statistical methods were used to predetermine sample size. The
experiments were not randomized unless otherwise stated. All fig-
ures of UMAP plots randomize the order in which cells are plotted.
Investigators were not blinded to allocation during experiments and
outcome assessment.

Reporting summary
Further information on research design is available in the Nature
Research Reporting Summary linked to this paper.

Data availability
FASTQ files containing raw reads from the scRNA-seq and scTCR-seq anal-
yses have been deposited with the European Genome-phenome Archive
(EGA) under studies EGAS00001003993 and EGAS00001003994, and
datasets EGAD00001005464 and EGAD00001005465. These files are
available under controlled access upon request to the Data Access Com-
mittee, with contact information provided at EGA (https://www.ebi.
ac.uk/ega/home). Processed output files from Cell Ranger, integrated
assay results from Seurat, and metadata with UMAP coordinates, cluster
assignments, and clonotypes are available from the NCBI GEO under
accession GSE139555.

Code availability
Computer code used to generate the analyses and figures in this paper
are provided as a as a Supplementary File to the NCBI Gene Expression
Omnibus (GEO) accession GSE139555.


  1. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat.
    Commun. 8 , 14049 (2017).

  2. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177 , 1888–1902 (2019).

  3. Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a
    transitional profibrotic macrophage. Nat. Immunol. 20 , 163–172 (2019).

  4. Del Carratore, F. et al. RankProd 2.0: a refactored bioconductor package for detecting
    differentially expressed features in molecular profiling datasets. Bioinformatics 33 , 2774–
    2775 (2017).

  5. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing
    and microarray studies. Nucleic Acids Res. 43 , e47 (2015).

Free download pdf