Single-cell analysis reveals lineage segregation in early post-implantation mouse embryos

The mammalian post-implantation embryo has been extensively investigated at the tissue level. However, to unravel the molecular basis for the cell-fate plasticity and determination, it is essential to study the characteristics of individual cells. In particular, the individual definitive endoderm (DE) cells have not been characterized in vivo. Here, we report gene expression patterns in single cells freshly isolated from mouse embryos on days 5.5 and 6.5. Initial transcriptome data from 124 single cells yielded signature genes for the epiblast, visceral endoderm, and extra-embryonic ectoderm and revealed a unique distribution pattern of fibroblast growth factor (FGF) ligands and receptors. Further analysis indicated that early-stage epiblast cells do not segregate into lineages of the major germ layers. Instead, some cells began to diverge from epiblast cells, displaying molecular features of the premesendoderm by expressing higher levels of mesendoderm markers and lower levels of Sox3 transcripts. Analysis of single-cell high-throughput quantitative RT-PCR data from 441 cells identified a late stage of the day 6.5 embryo in which mesoderm and DE cells emerge, with many of them coexpressing Oct4 and Gata6. Analysis of single-cell RNA-sequence data from 112 cells of the late-stage day 6.5 embryos revealed differentially expressed signaling genes and networks of transcription factors that might underlie the segregation of the mesoderm and DE lineages. Moreover, we discovered a subpopulation of mesoderm cells that possess molecular features of the extraembryonic mesoderm. This study provides fundamental insight into the molecular basis for lineage segregation in post-implantation mouse embryos.

The mammalian post-implantation embryo has been extensively investigated at the tissue level. However, to unravel the molecular basis for the cell-fate plasticity and determination, it is essential to study the characteristics of individual cells. In particular, the individual definitive endoderm (DE) cells have not been characterized in vivo. Here, we report gene expression patterns in single cells freshly isolated from mouse embryos on days 5.5 and 6.5. Initial transcriptome data from 124 single cells yielded signature genes for the epiblast, visceral endoderm, and extra-embryonic ectoderm and revealed a unique distribution pattern of fibroblast growth factor (FGF) ligands and receptors. Further analysis indicated that early-stage epiblast cells do not segregate into lineages of the major germ layers. Instead, some cells began to diverge from epiblast cells, displaying molecular features of the premesendoderm by expressing higher levels of mesendoderm markers and lower levels of Sox3 transcripts. Analysis of single-cell high-throughput quantitative RT-PCR data from 441 cells identified a late stage of the day 6.5 embryo in which mesoderm and DE cells emerge, with many of them coexpressing Oct4 and Gata6. Analysis of single-cell RNA-sequence data from 112 cells of the late-stage day 6.5 embryos revealed differentially expressed signaling genes and networks of transcription factors that might underlie the segregation of the mesoderm and DE lineages. Moreover, we discovered a subpopulation of mesoderm cells that possess molecular features of the extraembryonic mesoderm. This study provides fundamental insight into the molecular basis for lineage segregation in post-implantation mouse embryos.
The development of mammalian embryo from a fertilized egg to a gastrulating embryo consists of a precisely controlled series of lineage specification and axis-patterning events. A group of pluripotent cells, the epiblast (EPI), 3 is set aside from two extra-embryonic lineages, the trophectoderm (TE) and primitive endoderm (PrE), during the first few days of mouse embryonic development (1). Cells from the EPI, PrE, and TE of preimplantation embryos can be clearly distinguished according to their quantitative gene expression profiles (2)(3)(4)(5). Following implantation at about embryonic day (E) 4.5, the PrE gives rise to the parietal endoderm (PE), which directly contacts the maternal tissue, and the visceral endoderm (VE), which remains in contact with the embryo and further develops into the endoderm of the visceral yolk sac. The TE develops into the ectoplacental cone (EPC) and extra-embryonic ectoderm (EXE), which are progenitors of the placenta. The EPI transforms into the egg cylinder, an elongated cup-like single layer of an epithelial structure, which later gives rise to both somatic tissues and the germ cell lineage of the embryo proper (6).
At around E6.0, a unique group of VE cells assemble at the distal tip of the egg cylinder and move to the prospective anterior side of the embryo to form the anterior visceral endoderm (AVE). The EPI cells converge toward the posterior proximal pole of the embryo to form the primitive streak (PS). Over the next 24 h, the PS lengthens and ultimately occupies the entire proximal-distal length of the posterior side of the embryo (1, 6 -9). The PS formation morphologically marks the onset of gastrulation, through which three primary germ layers are generated, and the basic body plan of the embryo is established. EPI cells that undergo epithelial-mesenchymal transition and ingress at the PS constitute the mesoderm (ME) and definitive endoderm (DE), whereas EPI cells that do not pass through the PS specify the neuroectoderm (NE) and surface ectoderm (6,10). The anterior and posterior regions of the PS are associated with distinct ME and DE lineages. The anterior and intermediate regions of the PS give rise to the lateral plate, paraxial and cardiac mesoderm, whereas the extreme anterior tip of the PS give rise to the prechordal plate, the notochord, the node, and the DE cell lineage (6,11). The posterior PS cells give rise to extraembryonic mesoderm (EXEM) and blood islands. Nevertheless, signaling-induced cell regionalization does not necessarily indicate an irreversible lineage commitment. The cells may retain pluripotency when they are subject to a different environment (12)(13)(14).
Various strategies have been utilized to identify key signaling pathways and developmentally regulated transcription factors in post-implantation mouse embryos at the tissue level. For examples, Wnt, Bmp4, FGF, and Nodal signaling are known to be critical for the cell lineage allocation and axis patterning in the post-implantation mouse embryo (1,(7)(8)(9)(15)(16)(17)(18)(19)(20)(21). However, the molecular basis for the cell-fate plasticity and lineage segregation remains elusive. A recently developed technique of single-cell RNA sequencing (scRNA-Seq) and single-cell highthroughput quantitative RT-PCR (qRT-PCR) provide opportunities to address these fundamental questions at a high resolution. Thus far, the single-cell techniques have been used to reveal pluripotency state transition and lineage segregation of embryonic stem cells and distinct cell types of preimplantation embryos (2)(3)(4)(5)(22)(23)(24)(25)(26). More recently, the technology was also used to decipher the mesodermal lineage diversification toward the hematopoietic system in the post-implantation embryo (27,28), and to compare the pluripotency state between preimplantation and post-implantation embryos (5,29). However, it still remains a challenge to obtain gene expression profiles of embryonic cells that specify earliest in the EPI and to know how they segregate into ME and DE in the PS in vivo.
In this study, we have used scRNA-Seq in combination with single-cell qRT-PCR to investigate the transcription signature and molecular heterogeneity in freshly isolated cells from mouse embryos on embryonic days (E) 5.5 and E6.5. The singlecell gene expression dataset allows us to visualize the differentiation state of individual cells at early post-implantation stages, improving our understanding of how early embryonic cells make cell fate decision into ME and DE lineages and potentially guiding in vitro differentiation of pluripotent stem cells for the clinical use.

Unique transcriptional signatures of the EPI, VE, and EXE at the early post-implantation stage
Mouse embryos undergo rapid growth at E5.5 and E6.5 (Fig.  1A). To obtain the global picture of transcriptional signatures of individual cells at these stages, we initially generated the transcriptomes of 124 cells from three embryos (E5.5 (I), E5.5 (II), and E6.5 (III)) by the scRNA-Seq method (supplemental Table  S1). Briefly, embryos containing only the EPI, VE, and EXE were dissociated into single-cell suspension after the PE and EPC were removed with digestion. mRNA of each cell was reversetranscribed and amplified to obtain cDNA, and the expression of the EPI marker Oct4 (30 -34), the VE marker Gata6 (35)(36)(37), and the EXE marker Hand1 (38,39) was examined by qRT-PCR. Because of our primary interest in EPI cells, most of the cells selected for scRNA-Seq were expressing Oct4 (108 Oct4 ϩ cells in total, 19 cells in each of the two E5.5 embryos, and 70 cells in the E6.5 embryo). Theoretically, there are about 120 cells in the epiblast of the E5.5 mouse embryo, and about 660 cells in the epiblast of the E6.5 mouse embryo (40). Thus, sequenced EPI cells accounted for about 10% of total EPI cells in each embryo. In addition to 108 EPI cells, we sequenced 8 Gata6 ϩ and 8 Hand1 ϩ cells, making a sum of 124 initially sequenced cells.
Principal component analysis (PCA) on transcriptome data of 124 samples was conducted to visualize the relationship among distinct cell types in the embryo. As anticipated, data clustered according to the expression of Oct4, Gata6, or Hand1 (Fig. 1B). Unsupervised hierarchical clustering of expression profiles was consistent with the PCA results (supplemental Fig.  S1A). Analysis of the correlation between different samples identified a pair of highly similar EPI cells (arrow in supplemental Fig. S1A), whose correlation coefficient was greater than 0.99. The correlation coefficient between other pairs of samples was 0.56 -0.89, revealing variable correlations among EPI cells at the early post-implantation stages.
With the availability of transcriptome data, we sought for genes specifically expressed in the three clusters (Mann Whitney U test, false discovery rate (FDR) Ͻ5%) (supplemental Table S2). As expected, there were many signature genes for EPI, VE, and EXE cells (supplemental Table S2 and Fig. 1C). Thus, the three clusters were concordantly defined as the EPI, VE, and EXE. Moreover, to obtain comprehensive signature genes of these three cell types, we compared our dataset with that from a recently published study (27), generating common lists of cell type-specific genes (supplemental Fig. S1B Table S2). Analyzing datasets from both studies, we noticed that Sox2 was expressed by the majority of EXE and EPI cells but was rarely expressed by VE cells,

Single-cell insight into the key events around gastrulation
whereas Otx2 was expressed by the most of VE and EPI cells but was rarely expressed by EXE cells (supplemental Table S2). The finding is in agreement with their known distributions (41,42). Identification of these cell type-specific genes will aid in our understanding of how different cell types form and interact during early embryonic development.
Notably, many ligands and receptors of FGF signaling showed cell type-specific expression patterns ( Fig. 1D and supplemental Table S2). For example, Fgf15 and Fgf4 were specifically expressed in EPI cells, whereas Fgfbp1 and Fgfr4 were enriched in VE and EXE cells. Interestingly, Fgf5, a well known marker for EPI cells (12), was expressed in the most of the EPI cells as well as in all of the VE cells but in none of the EXE cells. Fgfr2 was highly expressed by all of EXE cells but rarely detected in VE or EPI cells. The finding suggests that the expression of FGF ligands and receptors are spatially regulated in embryonic and extraembryonic cells.

Pre-MEN cells diverge from the EPI cells
We then focused our analyses on EPI cells. The anteriorposterior polarity of the mouse embryo is established at around E6.0, marked by the establishment of the AVE and formation of the PS. NE forms later in the anterior side, although the ME and DE are derived from the PS region at the posterior side of the embryo (1,6). We speculated that cells from the anterior and posterior parts of the EPI could be distinguished by their expression patterns of germ layer markers and that distinct molecular subtypes of these two regions could be identified. Therefore, we analyzed the expression of an annotated set of 90 expressed germ layer markers ( Fig. 2A), chosen because they embryos I, II, and III with transcriptome data as an input. Different symbols are used to indicate the embryo membership of sequenced cells, and different colors of the symbol are used to present the key molecular feature of the cells in terms of expression of Oct4, Gata6, and Hand1. RPKM Ͼ1 was considered expressed. Cells expressing Oct4, Gata6, and Hand1 formed distinct clusters, which were defined as the EPI, VE, and EXE, respectively. Most of the cells expressed only one of the markers (indicated by brown, rose, or light blue colors), except three VE cells that expressed a high level of Gata6 and a low level of Oct4 (indicated by a deep blue color) and one EPI cell that expressed a high level of Oct4 and a low level of Hand1 (indicated by black color). C, the heatmap showing expression patterns of representative specific genes in EPI, VE, and EXE cells. The whole list of genes specific to each cell type is provided in supplemental Table S2. The upper bar indicates the embryo membership of cells, and the lower bar indicates the lineage of cells. The left-hand-side bar indicates different categories of specific genes. Cells were clustered by the euclidian distance and ward linkage. D, differentially expressed FGF ligands and receptors in EPI, VE, and EXE cells, which are arranged in the same order and denoted in the same way as in C.
are known to regulate the specification of early lineages or are expressed in specific regions of early embryos, in the 108 sequenced EPI cells. The markers were divided into the following five groups: the anterior ME/DE/anterior PS/node; the posterior PS; the PS or the posterior EPI; the NE; and the epidermis (supplemental Experimental procedures). The first three groups were all related to the formation of the mesendoderm (MEN) in the PS.
We found that the most of these 90 markers scattered in different EPI cells, whereas certain markers were expressed by nearly all EPI cells examined, including Tdgf1, Nanog, Ifitm3, Otx2, Pou3f1, Churc1, and Krt18 ( Fig. 2A and supplemental Table S3). To learn whether some EPI cells were biased toward distinct subtypes, we further analyzed the expression of the 90 markers by PCA, as PC projections of genes could help identify the most information-rich genes in classifying cell types (2). PC projections of both cells and genes were calculated. The PC1 axis reflected a gene's expression percentage in the cell population, whereas the PC2 axis seemed to associate with the germ layer differentiation status. Marker genes related to MEN such as Fgf8, Eomes, Nanog, Tdgf1, Wnt3, and Nodal were located in the bottom region of the PC2 axis, whereas an NE marker Sox3 was located on the top of the PC2 axis (Fig. 2, B and C). Moving average analysis by ordering the cells according to their PC2 scores confirmed an inverse correlation between the posterior markers and Sox3 (Fig. 2D). This result suggests that cells positioning in the bottom region of the PC2 axis might be the perspective MEN lineage (described as pre-MEN cells hereafter). Notably, some of the

Single-cell insight into the key events around gastrulation
pre-MEN cells were obtained from E5.5 embryos (Fig. 2B), implicating that EPI cells begin to exhibit different characteristics even at E5.5 when distinct germ layers have not formed.
To substantiate our findings further, we carried out singlecell high-throughput qRT-PCR analysis of 98 Oct4 ϩ Gata6 Ϫ Hand1 Ϫ cells from additional three embryos (E5.5 (IV), E5.5 (V), and E6.5 (VI)). Out of the set of 90 germ layer marker genes, 45 genes were detected in at least one of these cells (supplemental Table S3). The posterior markers Fgf8, Eomes, Nanog, Tdgf1, Wnt3, and Nodal showed the same reverse correlation with Sox3 along the PC2 axis (supplemental Fig. S2, A-C). The pre-MEN cells were again identified, including EPI cells from E5.5 embryos (supplemental Fig. S2A). This result supports the existence of pre-MEN cells in the EPI at the early post-implantation stage.
Therefore, pre-MEN cells seem to prime toward the MEN lineage compared with the rest of the EPI cells at the early postimplantation stages, discriminated by elevated expression of a combination of posterior markers and reduced expression of Sox3. These cells could represent one type of the earliest specified cells among all EPI cells after implantation.

E6.5_Late embryos have distinct characteristics compared with E6.5_Early embryos
To further investigate gene expression profiles of single embryonic cells, we also conducted single-cell high-throughput qRT-PCR for 343 Oct4 ϩ cells collected from an additional four embryos (E6.5 (VII, VIII, IX, and X)), using primers for the set of 65 successfully detected lineage marker genes (supplemental Tables S4 and S5). We unexpectedly found that these cells had markedly increased percentages of cells double-positive for Oct4 and Gata6 (Oct4 ϩ Gata6 ϩ ) (Fig. 3A, dark green and light green). This caught our attention, as expression of Oct4, Gata6, and Hand1 was largely mutually exclusive in an individual cell from the embryos that we examined earlier (embryos I-VI). green colors, respectively), increased in cells from E6.5_Late embryos (embryos VII, VIII, IX, and X), compared with cells from E5.5 and E6.5_Early embryos (embryos II, III, IV, V, and VI). Embryo I was an E5.5 embryo, for which the expression of Oct4, Gata6, and Hand1 was only partially analyzed. Thus, cells from the embryo I were not used in the statistical analysis in A. B, coexpression of Gata6 and Oct4 in an E6.5_Late embryo (XII) but not in an E6.5_Early embryo (XI). Whole embryos were double-immunostained with anti-Oct4 and anti-Gata6 antibodies. Confocal images were acquired as z-stacks of xy images. However, only one of the xy images is shown for clarity. Images containing Gata6 staining were processed once by the median filter (at the parameter 3*3) in the Image-Pro Plus software to reduce background noises. Schematic figures of E6.5_Early and E6.5_Late embryos are shown on the right side. Scale bars, 100 m. C, the heatmap showing the decrease in the expression of some pluripotency markers in E6.5_Late embryos. Only cells being single-positive for Oct4 (Oct4 ϩ Gata6 Ϫ Hand1 Ϫ ) were included in the analysis. The levels of Fgf4, Dppa2, Dppa4, and Gdf3 were measured by single-cell high-throughput qRT-PCR and normalized to the level of Gapdh. D, an image of the embryo E6.5_Late (VII). Scale bar, 100 m.
Both Oct4 and Gata6 have been reported to be expressed in the PS region at the gastrulation stage, in addition to their respective distribution in the EPI and VE before gastrulation (31,32,35,43,44). We inferred that Oct4 ϩ Gata6 ϩ cells could be in the PS region of the late-stage E6.5 embryos after gastrulation had begun (embryos VII-X were named E6.5_Late and their EPI was named Late EPI). In contrast, the E6.5 embryos we examined earlier (embryos III and VI) contained very few Oct4 ϩ Gata6 ϩ cells, being named E6.5_Early. Furthermore, the E5.5 and E6.5_Early embryos (I-VI), either having no PS or a very small one, were uniformly named as Early embryos, and their EPI was named Early EPI accordingly.
To provide further evidence for the existence of E6.5_Early and E6.5_Late embryos, we first examined the coexpression of Oct4 and Gata6 at a single cell level through immunofluorescence staining. In an E6.5_Early embryo (embryo XI), Oct4 and Gata6 were detected in EPI and VE, respectively, whereas in an E6.5_Late embryo (embryo XII), the two proteins were found coexpressed in a subset of cells (yellow) clustering in the PS region, in addition to their respective distribution in EPI and VE cells (Fig. 3B). Second, based on the PCA map constructed using qRT-PCR data of the 65 successfully detected germ-layer markers (supplemental Fig. S3, A and B, and Table S5), the majority of Oct4 ϩ Gata6 ϩ cells formed a cluster distinct from the majority of Oct4 ϩ Gata6 Ϫ cells in the E6.5_Late embryos, although a few Oct4 ϩ Gata6 Ϫ cells also appeared in the Oct4 ϩ Gata6 ϩ cluster. Some MEN markers (such as Bmp7, Cdh2, Evx1, Lhx1, and T) became coexpressed in cells of the Oct4 ϩ Gata6 ϩ cluster (supplemental Fig. S3C), in which some NE markers (Pou3f1, Sox2, and Sox3) were down-regulated (one-sided Mann-Whitney U test, FDR Ͻ0.1). Our results support the notion that the most of the Oct4 ϩ Gata6 ϩ were MEN cells from the PS region. Interestingly, PCA analysis of cells in the Oct4 ϩ Gata6 ϩ cluster revealed that DE signature genes Cer1, Sox17, and Foxa2 were clustered together (supplemental Fig. S3, D and E), suggesting that cells in the Oct4 ϩ Gata6 ϩ cluster might segregate into DE and ME lineages. Third, we compared expression levels of some pluripotency-associated markers (Dppa2, Dppa4, Fgf4, and Gdf3) between the Oct4 ϩ cells from E5.5 and E6.5_Early embryos and those Oct4 ϩ cells from E6.5_Late embryos, and we found greatly reduced expression of these four genes in the Oct4 ϩ cells from E6.5_Late embryos ( Fig. 3C and supplemental Table S5). This finding also indicates that E6.5_Late embryos were at a later developmental stage than E6.5_Early embryos. Taken together, our results suggest that E6.5_Late embryos are distinct from E6.5_Early embryos, with MEN cells (many of them are Oct4 ϩ Gata6 ϩ ) emerging from the PS region of E6.5_Late embryos. The morphology of one E6.5_Late embryo is shown in Fig. 3D.

Segregation of E6.5_Late cells into ME and DE lineages
To obtain a genome-wide transcriptional profiling of cells in E6.5_Late embryos, we conducted scRNA sequencing for 76 cells from the Oct4 ϩ Gata6 ϩ cluster and 36 cells from the Oct4 ϩ Gata6 Ϫ cluster (supplemental Fig. S3, A and B). Whole transcriptome data of these 112 E6.5_Late cells (embryos VII, VIII, and X) were compared with those of 124 cells from the early embryos (embryos I-III) to construct a PCA map. The segregation of embryonic (EPI) and extraembryonic (VE and EXE) lineages could be seen from the PC4-PC5-PC6 projection (supplemental Fig. S4A). Of the 112 E6.5_Late cells, two cells clustered with the eight early VE cells (thus named "VE-like" in supplemental Table S1), and four cells clustered with the eight early EXE cells (thus named "EXE-like" in supplemental Table  S1). The rest of 106 E6.5_Late cells clustered with the 108 early EPI cells. These 214 cells were considered as embryonic cells and used for further analyses.
We next applied the diffusion map dimensionality reduction (28) to the 214 cells using the expression of the same 90 germlayer markers ( Fig. 2A) as an input. This analysis revealed that a proportion of E6.5_Late cells split into two branches, and most of the cells on the two branches were Oct4 ϩ Gata6 ϩ (deep green and light green in supplemental Fig. S4B). The PCA map revealed that the cells on one branch should be DE cells, because they expressed higher levels of DE signature genes (Cer1, Foxa2, Hhex, and Sox17); and the cells on the other branch should be ME cells, as they enriched ME signature genes (Mesp1, Lefty2, Ifitm1, and Evx1) (supplemental Fig. S4C). We then divided the 214 cells into four groups according to their distribution on the diffusion map: EPI, DE, ME, and a group containing three intermediate cells and one cell that did not seem to cluster with any other cells (these four cells were designated as "Other" hereafter) (Fig. 4A). The EPI cluster included four subgroups as follows: Early pre-MEN, the rest of the Early EPI, Late pre-MEN, and the rest of Late EPI (Fig. 4A). On the diffusion map, pre-MEN cells from the Early EPI (defined in Fig. 2) were closer to DE and ME cells than the rest of Early EPI cells (Fig. 4A), and the Late EPI subgroup also contained some cells close to the ME and DE cells (named Late pre-MEN cells), suggesting the continued existence of pre-MEN cells in the E6.5_Late embryos. To avoid biased grouping caused by the utilization of only 90 germ layer markers, we combined the 90 germ layer markers with all genes associated with GO categories "Mesoderm," "Endoderm," and "Ectoderm" to generate a list of 822 marker genes (supplemental Table S6). With the extended markers, cells were similarly clustered into the three major groups (EPI, DE, and ME) on the diffusion map (supplemental Fig. S4D), validating the segregation of DE and ME lineages. However, Early pre-MEN cells mixed with the rest of Early EPI cells, and Early EPI was separated from Late EPI. This might be caused by the inclusion of some developmental stageassociated genes in the 822 genes in addition to germ layer markers, which could mask the germ layer-associated feature.
To gain detailed information for the distribution of marker genes in different groups of cells, we compared the expression of the 90 germ-layer markers (divided into five groups in Fig.  2A) among EPI, DE, and ME groups and marked genes specific for each group (Mann-Whitney U test, FDR Ͻ0.05) (Fig. 4B). As expected, most markers of the first three groups were enriched in DE or ME cells. Interestingly, NE markers Six1 (45) and Six3 (46) and epidermis marker Krt8 (12) were found to enrich in the DE lineage in the E6.5_Late embryos, suggesting their dynamic distribution in different lineages. Moreover, the DE and ME cells significantly coexpressed their respective lineage markers, whereas pre-MEN cells did not coexpress the DE or ME markers (Fig. 4C). Among 90 markers, Cer1, Krt18, Mesp1, and Lefty2 were with the most extreme PC loadings when DE and ME cells were clustered by PCA (supplemental Fig. S4C), indicating that they were the most discriminating markers for the DE (Cer1, Krt18) and ME (Mesp1, Lefty2) lineages, respectively. The expression of these four markers was plotted on the diffusion map (Fig. 4D), together with Fgf8 and Sox3, which were up-regulated and down-regulated, respectively, in Early pre-MEN cells (Fig. 2). Collectively, this analysis first reveals the molecular feature in the segregation of the DE and ME in postimplantation embryos at a single cell level.

Lineage-specific genes across developmental stages
To look for more stage-specific and lineage-specific genes that were differentially regulated during segregation of DE and ME lineages, we compared transcriptome data between Early EPI and Late EPI as well as among Late EPI, DE, and ME. As cDNA libraries for E6.5_Late embryonic cells were constructed using two different methods, we analyzed data from the two batches separately for comparisons and selected the common genes (supplemental Table S6) to minimize batch effects. 5600 genes were down-regulated from Early EPI to Late EPI (onesided Mann-Whitney U test, FDR Ͻ0.25). Many of down-regulated genes associated with the GO term "cellular metabolic process" (Rank 1, p Ͻ 10 Ϫ121 ) or "gene expression" (Rank 24, p Ͻ 10 Ϫ38 ). Although GO analyses with the 5600 genes did not identify terms related to "pluripotency" (p Ͼ 0.1), many pluripotency-associated genes were down-regulated from Early EPI to Late EPI, including Dppa2, Dppa4, Fgf4, Gdf3 (consistent with Fig. 3C), and Utf1. 149 genes were up-regulated from Early EPI to Late EPI (one-sided Mann-Whitney U test, FDR Ͻ0.25). Many of the up-regulated genes are related to the GO terms "nucleic acid metabolic process" (Rank 1, p Ͻ 10 Ϫ21 ), "gene expression" (Rank 28, p Ͻ 10 Ϫ14 ), or "developmental process" (Rank 29, p Ͻ 10 Ϫ14 ), including Gdf9, Gsk3b, Hmga2, Lef1, Lhx1, Mycn, Pten, Sall4, and Sox11.
In addition, we found 10 genes enriched in ME cells compared with DE cells (Ccnd2, Cdh11, Gas1, Meis2, Mesp1, Mycn, Rbms1, Snai1, Wnt3, and Wnt5a), and 129 genes enriched in DE cells compared with ME cells (one-sided Mann-Whitney U test, FDR Ͻ0.25) (supplemental Table S6). Interestingly, "Wnt signaling pathway" was identified for genes enriched in the DE group compared with the ME group (Rank 14, p Ͻ 10 Ϫ6 ), including Wnt receptor genes (Fzd5, Fzd6, and Fzd8) and genes involved in "negative regulation of Wnt signaling pathway" (Rank 28, p Ͻ 10 Ϫ5 , Cdh1, Cer1, Gsc, Sfrp1, Sfrp5, Shisa2, Six3, and Sox17) (supplemental Table S6 and Fig. 5A). In contrast, Wnt agonists Wnt3 and Wnt5a as well as Wnt downstream gene Ccnd2 were enriched in ME cells compared with DE cells (Fig. 5A). These results suggest that Wnt signaling might function in both autocrine and paracrine manners in DE and ME lineages. It will be interesting to test the function of these molecules during the segregation of DE and ME lineages.
To characterize transcriptional regulation networks that might underlie the segregation of DE and ME lineages, unsupervised hierarchical clustering analysis was performed based on the Connection Specificity Index (CSI) (21, 47) of differentially expressed transcription factors (TFs, GO: 0003700) among Late EPI, DE, and ME lineages (supplemental Table S6). TFs formed three major module cliques (MC1-MC3, Fig. 5B), in which ME-, EPI-, and DE-specific TFs were enriched, respectively (Fig. 5C). There were also highly connected subclusters (sMC1-sMC3). Genes co-up-regulated in ME and DE cells (Lhx1, Gata6, and Mixl1) appeared in sMC1, whereas sMC2 included genes specific for ME cells (Mesp1, Snai1, Wnt5a, and Meis2), and sMC3 contained genes specific for EXEM cells (Hand1, Tbx3, Msx2, discussed below). Coexpression networks based on the expression correlation CSI recapitulated MC1-3 (Fig. 5D), suggesting that TFs of each group are systematically regulated. Because negative correlation often occurs between genes regulating alternative cellular states (21,48), the negative correlation between TFs of MC3 and MC1 (Otx2/Zfp516, Otx2/Tbx3, Otx2/Hand1, Foxa2/Lef1, Aff1/Tbx3, and Aff1/ Msx2) suggested that these TFs might account for the fate choice of DE and ME lineages. Mycn, which was expressed at a higher level in both Late EPI and ME cells than in DE cells (one sided Mann-Whitney U test, FDR Ͻ0.25), might also play a

Single-cell insight into the key events around gastrulation
critical role in the segregation of DE and ME lineages, because it had negative correlations with several MC3 TFs (Sox17, Six3,  Hhex, Prdm1, Foxa1, and Foxa2). Our finding would be of great help to deepen the understanding of lineage segregation of DE and ME in vivo.
It is important to understand the regulation of pluripotencyassociated genes during the in vivo development process. Plu-ripotency-associated genes were selected from published studies (5,49) and GO terms related to pluripotency (supplemental Table S6). Some pluripotency-associated genes also act as lineage specifiers having additional functions during early phases of germ layer specification and commitment, such as Sox2 (50), Tdgf1 (51, 52), and Tbx3 (53,54). We performed CSI analyses on differentially expressed pluripotency-associated genes among Early EPI, Late EPI, DE, and ME lineages (Mann-Whitney U test, FDR Ͻ0.25) (supplemental Table S6 and Fig. S5, A-C). Three major module cliques (MC1-MC3) were identified, which contained genes enriched in Early EPI, E6.5_Late cells (including Late EPI, DE, and ME cells), and DE cells, respectively (supplemental Fig. S5B). Genes in a sub-module clique (sMC2) were highly connected and enriched in the Early EPI, including Fgf4, Med30, Paf1, Dppa4, Dppa5a, Gdf3, Dnmt3l, and Dppa2 (supplemental Fig. S5, A and B). Genes in another sub-module clique (sMC1) were enriched in both Early EPI and Late EPI, including Cdh1, Sox2, Dnmt3a, Dnmt3b, Utf1, L1td1, and Phc1 (supplemental Fig. S5, A and B). Hierarchical clustering revealed that the majority of cells from E5.5 and E6.5_Early embryos were in the same cluster, separating from cells of E6.5_Late embryos, using either differentially expressed genes (supplemental Fig. S5B) or all 249 pluripotency-associated genes as the input (data not shown), suggesting that cells from E6.5_Early embryos were more similar to cells from E5.5 embryos than to cells from E6.5_Late embryos. Coexpression networks on the basis of CSI revealed the connections among pluripotency-associated genes (supplemental Fig. S5C). In particular, we found that Mycn, which was negatively connected with several DE-specific genes (Fzd5, Sfrp1 and Kit), was closely related to Tet1. It has recently been reported that TET-mediated DNA demethylation regulates the expression of Lefty2 and is essential for the development of ME (55). The finding hints of a potential role of Mycn for the ME formation. It will be interesting to investigate how different modules or submodules are regulated during embryonic developmental processes.

Subpopulation of ME cells exhibits characteristics of the EXEM
We consistently noticed a positive correlation among genes Hand1, Tbx3, and Bmp4 when cells from E6.5_Late embryos were analyzed by PCA using the germ layer markers (supplemental Fig. S3, D and E). The analyses related to the expression correlation CSI of TFs also revealed that Tbx3, Hand1, and Msx2 formed a special cluster (sMC3 in Fig. 5B). To precisely illustrate such a pattern, we analyzed the cells on the two branches of diffusion map (DE, ME, and Other clusters in Fig.  4A, together called MEN cells, all from E6.5_Late embryos) by PCA using RNA-Seq data of the 822 markers as an input. As expected, DE and ME cells could be discriminated from PC2 to PC3 axes (Fig. 6A). Interestingly, genes such as Hand1, Tbx3, Bmp4, Msx2, Igf2, Krt18, Krt8, and Foxf1 were at the top end of the PC4 axis and were enriched in cells at the top end of PC4. In contrast, genes such as Tdgf1, Mixl1, Fgf8, Otx2, Dusp6, and Zic2 were at the bottom end of the PC4 axis and were downregulated in cells at the top end of PC4 (Fig. 6, A and B). A recent study reported that the expression of Hand1, Bmp4, Tbx3, Msx2, Krt8, Krt18, and Foxf1 was higher and that the expression of Otx2, Tdgf1, Dusp6, Mixl1, Lefty2, and Lhx1 was lower in EXEM cells (known to arise from the posterior PS) compared with cells from the embryonic part of the ME in E7.0 and E7.5 mouse embryos (27). These results support the notion that the cells at the top end of PC4 should be EXEM cells. Gene Set Enrichment Analysis (GSEA) (55, 56) further revealed the similarities between the gene set down-regulated or up-regulated in E6.5_Late EXEM cells and the gene set down-regulated (Gene set 1, normalized enrichment score (NES) ϭ 1.89, FDR ϭ 0, Fig. 6C) or up-regulated (Gene set 2, NES ϭ Ϫ2.18, FDR ϭ 0, Fig. 6D) in E7.0/E7.5 EXEM cells (27), respectively.

Discussion
Here, we apply scRNA-Seq and high-throughput qRT-PCR approaches to investigate gene expression patterns in nearly 600 cells harvested from E5.5, E6.5_Early, and E6.5_Late mouse embryos. Our study provides rich single-cell gene expression data for visualizing the pluripotency and differentiation state of cells at the early post-implantation stages. The distinct groups of cells identified in this study (pre-MEN, DE, ME, and EXEM) are summarized in Fig. 7. Genes enriched in these groups should be important for corresponding developmental processes. The transcriptomes revealed here for single cells of both embryonic and extra-embryonic origins would enhance our understanding how different cell lineages are continuously specified and established after implantation.
Our RNA-Seq data were mainly generated from Oct4 ϩ embryonic cells. Analyses of these data reveal that the pluripotency and differentiation status of embryonic cells changed greatly around the gastrulation stage. Interestingly, we found that EPI cells from E6.5_Early embryos were more similar to those from E5.5 embryos than to those from E6.5_Late embryos ( Fig. 3C and supplemental Fig. S5B). The EPI contains founder cells of all somatic lineages in amniotes. One of the key unsolved questions is when and how the MEN and NE are segregated. In E5.5 and E6.5_Early embryos, it was difficult to identify typical pre-MEN cells or pre-NE cells, because many MEN and NE markers are scattered in a broad range of cells (Fig. 2A). The expression of multiple kinds of pluripotency markers and coexpression of different categories of germ-layer markers in individual EPI cells would provide the molecular basis for the developmental plasticity of cells at the early stages. A previous study proposed that an initial phase of stochastic gene expression followed by signal reinforcement may drive lineage segregation through antagonistically separating a cohort of initially equivalent cells when PrE and EPI segregate within the inner cell mass of mouse blastocysts (4). Similar mechanisms might underlie the segregation of MEN and NE within the post-implantation EPI. However, the emergence of pre-MEN cells and DE and ME cells but not NE cells in our study suggests that the NE might form later than the MEN. This is consistent with the fact that signals secreted by the anterior MEN are essential for the induction of the NE (6). Pre-MEN cells had higher levels of Nodal, Wnt3, and Fgf8 ligands, which are all signaling molecules required for the generation of the MEN (6). Previous reports suggested that potential common MEN progenitors exist transiently that subsequently give rise to either DE or ME (57)(58)(59)(60)(61) and that Cdh1, Pdgfra, Gsc, and Foxa2 are the key markers for MEN progenitor cells in vitro (59). We found that these markers were coexpressed in a small subset of our pre-MEN cells (3 of 21 identified pre-MEN cells expressed 3 or 4 of these four markers). We consider that pre-MEN cells are more prone to differentiate into MEN cells than the rest of EPI cells. However, further experimental evidence is needed to verify this notion and to test whether signal enforcement is required for further differentiation of pre-MEN cells.

Single-cell insight into the key events around gastrulation
Our results indicate that ME and DE cells emerge around E6.5, when some ME markers and DE markers became significantly coexpressed in ME cells and DE cells, respectively. Using our RNA-seq data, we identified many genes specifically expressed in the DE and ME. These results are highly consistent with a previous in vitro result that Foxa2, Sox17, Cdh1, and Krt18 are selective markers for the DE, whereas Cad11 and Pdgfra are selective markers for the ME (supplemental Table  S6) (59), providing the evidence for the reliability of our data. It is interesting that Wnt ligands (Wnt3 and Wnt5a) were up-regulated in ME cells, whereas Wnt receptors and some negative components of the Wnt pathway were enriched in DE cells (Fig.  5A). In mice, Wnt3 is necessary for the formation of the PS and emergence of the DE (62). Wnt signaling should be required for DE induction and maintenance, because conditional loss of ␤-catenin in MEN cells results in the production of ectopic cardiac mesoderm at the expense of the DE (63) and conditional deletion of ␤-catenin in DE and VE results in the loss of Sox17 expression (64), which is necessary for the segregation of the gut endoderm from the ME (65). Enriched expression of Wnt ligands in ME cells might offer a paracrine mechanism for the induction of DE formation, whereas expression of negative components of the Wnt pathway in DE could be a feedback strategy to confine the signal at a balanced level. Our work provides systematic and novel insights into how the segregation of ME and DE is regulated in terms of both transcriptional networks and signaling cross-talk.
Most of the DE and ME cells coexpressed Oct4 and Gata6, indicating important roles of Oct4 and Gata6 in the MEN development. It was reported that the Gata6 transcript was detected in the ME and DE in wild-type mice (35,44). However, chimeric experiments showed that Gata6 Ϫ/Ϫ cells contributed effectively to the heart and gut. It is possible that Gata4 or other members of the Gata family could compensate for the absence of Gata6 in the MEN development. Noticeably, some Oct4 ϩ Gata6 Ϫ cells clustered with Oct4 ϩ Gata6 ϩ cells and also acquired ME-or DE-related characteristics in E6.5_Late embryos (supplemental Fig. S3, A and B), suggesting that Gata6 might not be the earliest marker of cells on the differentiation path. Nevertheless, our finding is consistent with the importance of Oct4 in cells of the PS region (43,66).
Recently, two groups reported their single-cell transcriptomic analyses of mouse early post-implantation embryos (21,27,29). However, characteristics of the pre-MEN cells and the divergence between ME and DE cells were not included in these studies. We analyzed those two datasets using our methods to check whether they could validate our finding in terms of pre-MEN, DE, and ME cells (supplemental Fig. S6). The 481 E6.5 embryonic cells in Scialdone's dataset (27) were from seven embryos, which possibly contained both E6.5_Early and E6.5_Late embryos. They all expressed high levels of Oct4 (RPKM Ͼ5.88), in contrast to their 20 EXE or VE cells, which expressed low levels of Oct4 (RPKM between 0 and 3.39). 29 of their embryonic cells clustered with our ME cells in the diffusion map using expression of the 90 germ-layer markers as an input (supplemental Fig. S6A). Their embryonic cells contained 13 Oct4 ϩ Gata6 ϩ cells in total, of which 10 cells were in the ME cluster. Therefore, their data support the notion that most of the Oct4 ϩ Gata6 ϩ cells are MEN cells from the PS region. The rest of cells (452/481) clustered with our EPI cells. The pre-MEN cells could be identified from 452 cells in the EPI cluster, although Early pre-MEN cells from E6.5_Early embryos and Late pre-MEN cells from E6.5_Late embryos could not be distinguished (supplemental Fig. S6B). Genes enriched in DE or ME cells, including 5730457N03Rik, Amot, Ccnd2, Cdh11, Eomes, Evx1, Fgf8, Foxa2, Frzb, Gsc, Prtg, T, Tdgf1, Tgfbr3, and Wnt3, were up-regulated, and genes such as Dusp4, Epcam,

Single-cell insight into the key events around gastrulation
Fgf4, Igfbp2, Krt18, Phc1, Sox3, Uchl1, and Utf1 were downregulated, in both Scialdone's (27) pre-MEN cells and our pre-MEN cells (Mann-Kendall tests over PC2, FDR Ͻ0.1) (supplemental Table S3). None of the Scialdone et al. (27) cells were similar to our DE cells (supplemental Fig. S6A). In another study by Nakamura et al. (29), 16 cells from E5.5 embryos and 18 cells from E6.5 embryos were sequenced and analyzed. Diffusion map analysis showed that some of Nakamura's embryonic cells (all expressing T) were also similar to our ME cells, although the size of their samples was small, and no cells were similar to our DE cells (supplemental Fig. S6C). In addition, two published studies (21,27) analyzed genes and signals involved in the anterior-posterior regionalization in the PS of E7.0 and E7.5 embryos. Our study extends this issue to an earlier stage (E6.5) and uncovers more aspects of signals and transcription factors that may govern the segregation of EXEM cells and embryonic ME cells. Finally, we examined whether EXEM cells could be identified from the Scialdone et al. (27) or the Nakamura et al. (29) E6.5 ME cells. Our analysis showed that no EXEM cells were identified from the Scialdone et al. (27) dataset (supplemental Fig. S6D), whereas one cell from the Nakamura et al. (29) dataset was close to the EXEM cluster we identified (supplemental Fig. S6E). The lack of DE and EXEM cells in these two datasets might be due to technical reasons, especially in the manner to dissociate embryonic cells. It is also possibly caused by the mouse strain difference or stage difference.
Collectively, our large scale single-cell gene expression analysis of early post-implantation mouse embryos should be valuable for understanding the pluripotency and differentiation states of cells at this particular developmental window. Our identification of specific groups with characteristic gene expression patterns of pre-MEN, ME, DE, and EXEM cells is significant for understanding how the early somatic lineages are initiated. The in-depth analysis of these data would reveal further insights into the mammalian development and help to develop efficient strategies to differentiate pluripotent stem cells into regenerative medicine-relevant cells.

Embryo collection and single cell isolation
Surgical procedures were performed in compliance with protocols approved by the Animal Committee of the Institution of Health Sciences. Embryos of the C57BL/6 strain were collected at E5.5 and E6.5 (noon of the day when the vaginal plug was detected was designated as E0.5) in M15 medium. One single embryo was used at a time, which was incubated in EGTA/PBS and then treated by trypsin. The PE and EPC were easily removed, and the remaining embryo was transferred into DMEM with 10% FBS before dissociating into single-cell suspension. The single cells were transferred into a lysate buffer by a glass pipette assembled on a micromanipulator. The embryos used are listed in supplemental Table S4.

Preparation of single-cell cDNAs
For cells from E5.5 and E6.5_Early embryos (I-VI), and cells from embryos E6.5_Late (IX) and E6.5_Late (X), cDNAs were prepared according to previous methods (67,68). Briefly, oli-go(dT) primers, which were complementary to poly(A) tails of mRNAs, were incorporated into the 5Ј-end of the first-strand cDNAs during the reverse transcription step. Then a poly(A) tail was added to the 3Ј-end of the first-strand cDNAs by terminal deoxynucleotidyltransferase. After the second strands were synthesized, cDNAs were amplified by 20 ϩ 12 cycles of PCR. For cells from embryos E6.5_Late (VII) and E6.5_Late (VIII), mRNAs were reverse-transcribed by SMARTScribe TM reverse transcriptase and then directly amplified by 18 ϩ 12 cycles. To find genes differentially expressed among Early EPI, Late EPI, DE, and ME groups, cells from embryos VII/VIII were considered as one batch, and cells from embryo X were considered as the other batch, and the two batches of EPI, DE, and ME cells were compared separately. The common differentially expressed genes were obtained.

Single-cell qRT-PCR or high-throughput qRT-PCR
Amplified cDNAs without purification from single cells were equally diluted (1:10) and combined with the TaqMan Universal PCR Master Mix, primers, and probes and examined either by qRT-PCR in 384-plates on 7900HT PCR System (ABI) or by high-throughput qRT-PCR in 96.96 Dynamic Arrays on Bio-Mark System (Fluidigm). C t values were calculated by the system software. Sequences of TaqMan probes and corresponding primers are shown in supplemental Experimental procedures.

Single-cell RNA-seq and data processing
The samples were purified either by gel electrophoresis or Agencourt AMPure XP beads and subsequently sequenced by single-end sequencing at 50 bp length on the Illumina Hiseq2000 or Hiseq2500 platform. Reads that contained poly(A), low quality, and adapters were pre-filtered before mapping. The remaining reads were mapped to the mm10 genome. Base calls were performed using CASAVA version 1.7, and sequences were aligned with tophat 2.0. After mapping, only unique reads that mapped to a single locus of the genome were used for downstream analyses. Only samples with unique reads of Ͼ0.5 million were used. Cufflinks 2.2.1 was used for calling RPKM values.
Genes that were detected (RPKM Ͼ0.01) in at least one cell were kept. During purification, transcripts with a length Ͼ500 bp were recovered to reduce the contamination of short primer polymers. Accordingly, we removed the transcripts shorter than 500 bp from the gene list.
PCA, diffusion map analysis, GO analysis, GSEA, and hierarchical clustering analyses were achieved on the R platform. PCA and diffusion map analyses are both feature-extraction methods, which are frequently used in single-cell studies (2,26,28,29). Mann-Whitney U test and Mann-Kendall tests were also performed on the R platform. Multiple test correction was performed to adjust p values using the Benjamini-Hochberg method.
The pairwise expression Pearson correlation coefficient (PCC) of TFs or pluripotency genes was used to calculate CSI scores (21,47). Hierarchical clustering was performed based on the CSI scores. Next, the CSI coexpression network was constructed based on CSI scores. We also added the edges with negative PCC, where corresponding CSI scores were calculated based on the absolute pairwise PCC.
Confocal images were acquired using a Zeiss LSM 880 NLO laser-scanning microscope as z-stacks of xy images taken at 5-m z-intervals (ϫ20). Raw data were processed using the Image Pro-Plus software.

Availability of data and materials
The RNA-Seq data set supporting results of this article is available in the GEO database (www.ncbi.nlm.nih.gov), GSE70713.