Nature - USA (2020-01-23)

(Antfer) #1

was aligned to genome to calculate the FPKM values of genes, and ana-
lysed by Seurat (v.2.3.4).


Identification of differentially expressed genes between groups
Differentially expressed genes between two groups were obtained by
using edgeR^54. Genes with uncorrected P values (likelihood ratio tests)
smaller than 0.01, absolute fold difference larger than 2 and median
of FPKM larger than 1 in one group were regarded as differentially
expressed genes.
For HESX1+T− and HESX1−T+ cells, EPIs at 14 d.p.f. were classified as two
groups, HESX1+T− and HESX1−T+, based on HESX1 and T expression in
these cells. Cells positive for T (FPKM > 1) and negative for HESX1 (FPKM
< 1) belonged to the HESX1−T+ group, and cells negative for T (FPKM < 1)
and positive for HESX1 (FPKM > 1) belonged to HESX1+T− group. The
lists of down- and upregulated genes in HESX1+T− cells compared to
HESX1−T+ cells are shown in Supplementary Table 3.


Co-expression gene network analysis
In total, 181 scRNA-seq profiles of 7–9-d.p.f. blastocysts (in Supplemen-
tary Table 1) were used to perform co-expression gene networks (11
cells were not included owing to high expression of two lineage-marker
genes). These 181 cells were assigned as EPI, PrE and TrB cells based on
expression of selected marker genes. We filtered the genes by keeping
those with log 2 -scaled expression (FPKM + 1) values larger than 4 in at
least one sample and with correlation coefficients of at least 0.4 with
the feature vector of one of the three cell types (that is, EPI, PrE and TrB).
Filtered genes were used to construct co-expression networks of genes
with WGCNA^55. Three gene modules with the highest correlation coef-
ficient values in the three cell types were used to draw gene modules
for each cell type. Only transcription factors in each module were kept
when visualizing the three gene modules with Cytoscape (v.3.6.1)^56 ,^57.
The eigengene value matrix between each scRNA-seq profile and the
nine identified modules calculated by WGCNA analysis was used to
perform a 2D hierarchical clustering and visualized with pheatmap
function of the R platform.


GO and KEGG pathway analysis
Enriched GO terms and KEGG pathways for the genes related to dif-
ferent types of cells were identified using KOBAS3^58. Significant GO
terms and KEGG pathways were visualized with the ggplot function in
the ggplot2 package in R.


Gene regulatory network analysis
The PluriNetWork^37 was trimmed by keeping genes and their relations,
if genes had at least 10 FPKM in 60% of the subtype of EPI cells or if
the gene expression levels were three times higher in the subtype of
EPI cells than other cells. The trimmed networks were visualized with
Cytoscape (v.3.6.1)^55 ,^56.


Comparisons of human and monkey EPI development by
analysing scRNA-seq profiles
The genome and annotation of cynomolgus monkey (version mfa5.0)
were downloaded from the NCBI Genome database. scRNA-seq profiles
of 213 cynomolgus monkey (Macaca fascicularis) EPI cells reported
previously^39 were aligned to the monkey genome with the same options
when analysing human scRNA-seq profiles. The 222 human EPI and 213
cynomolgus monkey EPI scRNA-seq profiles were combined to keep
the FPKM values of the 16,487 homologous genes. The 16,487 com-
mon genes of human and cynomolgus monkey were filtered to keep
12,475 genes with log 2 (FPKM + 1) values of at least 4 in at least one of
the 222 human or 213 cynomolgus monkey cells. Raw FPKM values of
the 12,475 genes for the 222 human or 213 cynomolgus monkey cells
were normalized with the ‘normalize data’ and scaled with the ‘scale
data’ of the Seurat package in R^51. Normalized and scaled FPKM values
were used to perform PCA analysis by using the prcomp function in


R. PCA results were visualized with MatLab (MathWorks) for the first
three principle components. The second and third components were
also used to visualize the PCA results with the ggplot function in the
ggplot2 package of R. A total of 966 homologous genes in human and
monkey that contributed highly to PC1 (with absolute PC1 values of
more than 2 s.d. of the 12,475 genes) were clustered and visualized
with the heatmap function of the R platform. The 294 and 672 genes
with PC1 values of >2 and <−2 s.d. were used to perform GO enrichment
analysis with KOBAS3^58. In total, 1,152 genes with significant loading
scores for PC2 and PC3 (radius of PC2 and PC3 > 3 s.d and none of the
1,151 gene overlapping with 996 genes that had significant scores for
PC1 loading) were clustered and visualized with the pheatmap function
of the R platform.

Statistical analysis
No statistical methods were used to predetermine sample size. The
experiments were not randomized, and investigators were not blinded
to allocation during experiments and outcome assessment. Errors
and error bars represent s.e.m. from a minimum of five independent
embryos unless otherwise indicated. Figures display representative
results. Unless otherwise specified, the results were the same across
all the embryos analysed. For cell number, the significance difference
between two samples was evaluated by unpaired two-sample Student’s
t-test using Excel software (2016). For gene expression, the differences
in different cell types were analysed by Wilcoxon rank-sum test. P < 0.05
was considered as statistically significant differences.

Reporting summary
Further information on research design is available in the Nature
Research Reporting Summary linked to this paper.

Data availability
Source Data for Extended Data Figs. 1f, k–m, 4b, f, p, 5g, h, l, m, p, q, r, 9f,
10d–h are provided with the paper. scRNA-seq data have been depos-
ited in the Gene Expression Omnibus (GEO) under accession number
GSE136447 (scRNA-seq data). The SC3-seq data of cynomolgus mon-
key embryos (for Extended Data Fig. 10) are GEO accession GSE74767
(ref.^39 ); the scRNA-seq data of human pre-implantation embryos (for
Extended Data Fig. 3d, e) are with GEO accession GSE66507 (ref.^16 ) and
GSE36552 (ref.^17 ) and ArrayExpress accession E-MTAB-3929 (ref.^18 ).


  1. Gardner, D. K., Lane, M., Stevens, J., Schlenker, T. & Schoolcraft, W. B. Blastocyst score
    affects implantation and pregnancy outcome: towards a single blastocyst transfer. Fertil.
    Steril. 73 , 1155–1158 (2000).

  2. Xiang, L., Yin, Y. & Li, T. Protocol for a developmental landscape of 3D-cultured human
    pre-gastrulation embryos. Protoc. Exch. https://doi.org/10.21203/rs.2.16169/v1 (2019).

  3. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protocols 9 ,
    171–181 (2014).

  4. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for
    multiple tools and samples in a single report. Bioinformatics 32 , 3047–3048 (2016).

  5. Rosenbloom, K. R. et al. The UCSC Genome Browser database: 2015 update. Nucleic
    Acids Res. 43 , D670–D681 (2015).

  6. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory
    requirements. Nat. Methods 12 , 357–360 (2015).

  7. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 ,
    2078–2079 (2009).

  8. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-
    seq reads. Nat. Biotechnol. 33 , 290–295 (2015).

  9. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE
    Project. Genome Res. 22 , 1760–1774 (2012).

  10. Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes.
    Nucleic Acids Res. 47 , D766–D773 (2019).

  11. Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample
    quality control for high-throughput sequencing data. Bioinformatics 32 , 292–294
    (2016).

  12. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell
    transcriptomic data across different conditions, technologies, and species. Nat.
    Biotechnol. 36 , 411–420 (2018).

  13. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by
    pseudotemporal ordering of single cells. Nat. Biotechnol. 32 , 381–386 (2014).

  14. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories.
    Nat. Methods 14 , 979–982 (2017).

Free download pdf