nt12dreuar3esd

(Sean Pound) #1

biological processes, and ‘3.1-MT’ against the gene set for mitochondrial
genes, as shown in Extended Data Fig. 4g.


Comparison of clusters with external single-cell gene signatures
We computed reference gene signatures from the scRNA-seq data of
the datasets for Guo et al.,^3 Zhang et al.^4 and Yost et al.^5. The first two
datasets had count matrices that were already normalized by library
sizes in the form of transcripts per million (TPM) matrices, and we
computed this same normalization for the third dataset. For each
dataset, we considered only genes with at least one non-zero TPM value
across all cells, and computed the log 2 -transformed expression value
log 2 (TPM + 1), where 1 represents a pseudocount. For each cluster,
based on the metadata from the original analyses, we obtained robust
centroids by performing a 10% trimmed mean on log-transformed
expression values for each gene across cells in the cluster. We used
these centroids as reference gene signatures for the R package SingleR,
which assigned single-cell assignments to each cell in our dataset.
Gene expression measurements from our dataset were the cell counts
computed for each sample before integration, again considering only
genes with at least one non-zero TPM value across all cells in the sample
and computing the log 2 -transformed expression value log 2 (TPM + 1).
We cross-tabulated counts of cells based on both our internal cluster
assignments and the external assignments computed by SingleR, and
normalized the resulting matrix first by the external assignments and
then by our internal assignments to generate heat maps for Extended
Data Fig. 4a.


Display of gene expression
Bar plots in Extended Data Fig. 4b are computed using the scaled data
from the integrated assay using Seurat. Because the integrated assay
requires that genes be selected as variable in each of the component
datasets, some genes could not be depicted this way. In particular, the
PD1 gene was not included in the integrated assay, so its expression in
Extended Data Fig. 4d is taken from counts from the original RNA data.


Analysis of data from Guo et al.^3
Single-cell RNA-seq counts were obtained using the GEOquery library
in R and running the command getGEOSuppFiles for GSE99254. Expres-
sion data were obtained over 23,459 genes in 12,348 cells. Metadata
consisting of the patient, cluster assignment, and tissue source for
each cell were downloaded from the Gene Expression Omnibus (GEO)
project GSE99254 (https://www.ncbi.nlm.nih.gov/geo/query/acc.
cgi?acc=GSE99254). TCR sequences and clonotype groupings were
obtained from supplementary table 2 of Guo et al.^3. TCR sequences
were obtained for 10,202 cells in 7,398 distinct clonotypes.


Analysis of data from Zhang et al.^4
Single-cell RNA-seq counts were obtained using the GEOquery library in
R and running the command getGEOSuppFiles for GSE108989. Expres-
sion data were obtained over 23,459 genes in 11,140 cells. Metadata
consisting of the patient, cluster assignment, and tissue source for
each cell were downloaded from GEO project GSE108989 (https://www.
ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108989). TCR sequences
and clonotype groupings were obtained from supplementary table 4
of Zhang et al.^4. TCR sequences were obtained for 9,878 cells in 7,274
distinct clonotypes.


Signature for resident memory T cells
Genes were identified as being positively associated (13 genes) or
negatively associated (18 genes) with resident memory T cells, based
on Fig. 3a from a published study of tissue-resident versus circulating
T cells^8. The D4S234E gene was identified as a synonym for NSG1 (Entrez
ID 27065). Genes were matched against those in the integrated assay
from Seurat, showing an overlap with four positively associated genes
(CRTAM, ITGAE, DUSP6 and RGS1) and three negatively associated ones


(FAM65B, STK38 and KLF2). Although only a few genes showed overlap,
the published study indicates that genes in this signature share highly
correlated gene expression. The expression of each gene was converted
to z-scores by subtracting the mean expression across samples and
dividing by their standard deviation, and we computed a positive sig-
nature score for each T cell as the mean z-score from the four positively
associated genes and a negative signature score similarly from the
three negatively associated genes. The overall score was the difference
between the positive and negative signature scores.

Signature for terminal versus stem-like exhaustion
Microarray data, in the form of log 2 -transformed robust multichip
average (RMA) normalized intensities, were obtained from samples
GSM2227309 to GSM2227314 from GEO project GSE84105^9 using
the getGEO procedure of the GEOquery package in R. Differentially
expressed probes between the three CXCR5+ mouse samples and the
three TIM3+ mouse samples were found using the lmFit, contrasts.fit,
eBayes, and topTable procedures in the limma package in R, at a thresh-
old log 2 -transformed fold change of three or more. These probes were
assigned to genes using the annotate and mouse4302.db packages, and
then translated into human equivalents using the homologene package
in R. The resulting 58 CXCR5-associated genes and 42 TIM3-associated
genes were compared against our scRNA-seq dataset using the scaled
data array from the integrated assay computed by Seurat, yielding 12
CXCR5-associated genes (GPR183, CCR6, CRTAM, SLFN5, TNFRSF25, JUN,
IFI16, SLAMF6, XCL1, IL7R, EMB and SATB1) and 9 TIM3-associated genes
(ADAM8, PRDM1, CDKN2A, CCL4, CD7, AHR, GPR56, GZMA and CISH)
in common. The expression of each gene was converted to z-scores
by subtracting the mean expression across samples and dividing by
their standard deviation, and we computed for each T cell a CXCR5
signature score as the mean z-score from the CXCR5-associated genes
and a TIM3 signature score similarly from the TIM3-associated genes.
The overall score for terminal exhaustion versus stem-like exhaustion
was the difference between the TIM3 and CXCR5 signature scores.

Integration of scRNA-seq and scTCR-seq data
One issue in integrating scRNA-seq and scTCR-seq data is that the
clonal expansion pattern information from TCR-seq is applied to
clones, whereas cluster assignments from RNA-seq are applied to
individual cells, based on their transcriptional profiles. Therefore,
a clone may contain a diverse set of cells with different clusters
(Extended Data Fig. 4h, i). To integrate the data, we assigned tissue
and blood expansion patterns from clones to their constituent cells.
Clones were assigned a primary cluster, based on the cluster with the
largest representation of cells in the clone. In cases of ties, in which
the two largest representative clusters had equal counts, we assigned
no primary cluster to that clone.
Another consideration is that it was possible for cells to be assigned
to the non-T cell category based on the scRNA-seq assay, but given a
clonotype based on the scTCR-seq assay. For analyses of scTCR-seq
data alone, such cells would be included on the basis of their clonotype.
For analyses of scRNA-seq data alone, such cells would be excluded for
lacking a T cell cluster. However, for integrated analyses of scRNA-seq
and scTCR-seq data, the answer is unclear. As indicated in Extended
Data Fig. 1a, the assignment of clonotypes to non-T cells was relatively
uncommon in our dataset. In this study, we included such cells in inte-
grated analyses with a special cluster of ‘non-T’, computed primary
clusters as described above, and then assigned no primary T cell clus-
ter to clones that had ties for the two largest representative clusters
(with one potentially being non-T) or had non-T as the primary cluster.
Cells were then excluded from the integrated analysis if their parent
clone lacked a primary T cell cluster. The result of this procedure was
to include non-T cells with a clonotype at the outset, but then exclude
most of them (including all singletons) because their primary cluster
was non-T cells.
Free download pdf