Read pairs with more than 10% low-quality bases, adapter contaminants or artificial sequences introduced during the experimental processes were trimmed, and the cleaned reads were aligned to the human hg19 reference using Tophat (v2.0.12) with default settings (Trapnell et al., 2009). Additionally, 92 ERCC spike-ins were added to the reference annotation as the extra artificial transcripts. Cufflinks (v2.2.1) with default parameters was further used to assemble the transcripts and quantify transcription levels (FPKM, fragments per kilobase of transcript per million mapped reads) of annotated genes (Trapnell et al., 2010). Linear regression was applied to fit the data points between the averaged transcription levels of the 92 exogenous ERCC spike- in RNAs (log2 transformed) in each single-cell RNA-seq dataset and the provided number of molecules per lysis reaction for each single cell, and the absolute mRNA abundance in each single cell was calculated by normalizing against the spike-in RNAs (Treutlein et al., 2014). The expression level of repetitive elements was quantified using the read counts of repetitive elements per million RefSeq mappable reads only if the unique mapped reads were located in the annotated repetitive elements. Other published data, including those from human implantation embryos, human naïve ESCs, in vitro human PGCLCs, and mouse PGCs, were downloaded from the GEO datasets (Irie et al., 2015; Seisenberger et al., 2012; Takashima et al., 2014; Yamaguchi et al., 2013; Yan et al., 2013), and only the raw fastq reads were downloaded and incorporated into our analysis pipelines.