Article reSeArcH
ab
# cells = 18,722 Patient ID
ET01
ET02
ET03
ET04
ET05
Relative expression
Low
CD79A
DNTT
IRF7
ELANEAZU1
MPO
SPINK2CD52HOPX
AVP
CA1
PLEK
DAD1
ITGA2B
HDC
CLC
TK1
BLVRB
PDLIM1
HBB
FCER1A
TYMS
LYZ
IRF8
E/B/M MkP EP2 EP1EP-ccMEP-ccMEPHSPC1HSPC2HSPC3IMP1IMP2IMP-ccNP1NP2M/D1M/D2PreB1PreB2
c
d
WT (n = 9,338)
MUT (n = 7,276)
NA (n = 2,108)
HSPC
IMP
NP
MEP
EP
E/B/M
M/D
PreB
MkP
# cells = 18,722
e
High
AVP HOPX CLC IRF8 AZU1
ITGA2B CA1 BLVRB PLEK DNTT
Relative expression
Low
High
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
t-SNE2
t-SNE1
Extended Data Fig. 4 | Integration of samples from patients with
essential thrombocythaemia and assignment of progenitor subsets.
a, t-SNE projection of CD34+ progenitor cells from samples ET01–ET05,
after integration and batch correction using the Seurat package (Methods).
b, Heat map of top ten differentially expressed genes for clusters; lineage-
specific genes from a previous publication^26 are highlighted (Methods).
c, Representative lineage-specific genes projected onto the t-SNE
representation of CD34+ cells from samples from patients with essential
thrombocythaemia. d, t-SNE projection of CD34+ cells from samples
ET01–ET05 after applying a deep generative modelling approach for
the single-cell analysis using the scVI package (Methods)^19 , showing
assignments of progenitor subsets as determined after clustering the cells
using the Seurat package. e, Genotyping data from GoT are projected onto
the t-SNE representation generated after the scVI analysis of progenitor
cells from samples ET01–ET05. Cells without any GoT data are labelled
NA (not assignable).