Identification of H4K20me3- and H3K4me3-associated RNAs using CARIP-Seq expands the transcriptional and epigenetic networks of embryonic stem cells

RNA has been shown to interact with various proteins to regulate chromatin dynamics and gene expression. However, it is unknown whether RNAs associate with epigenetic marks such as post-translational modifications of histones, including histone 4 lysine 20 trimethylation (H4K20me3) or trimethylated histone 3 lysine 4 (H3K4me3), to regulate chromatin and gene expression. Here, we used chromatin-associated RNA immunoprecipitation (CARIP) followed by next-generation sequencing (CARIP-Seq) to survey RNAs associated with H4K20me3- and H3K4me3-marked chromatin on a global scale in embryonic stem (ES) cells. We identified thousands of mRNAs and noncoding RNAs that associate with H4K20me3- and H3K4me3-marked chromatin. H4K20me3- and H3K4me3-interacting RNAs are involved in chromatin organization and modification and RNA processing, whereas H4K20me3-only RNAs are involved in cell motility and differentiation, and H3K4me3-only RNAs are involved in metabolic processes and RNA processing. Expression of H3K4me3-associated RNAs is enriched in ES cells, whereas expression of H4K20me3-associated RNAs is enriched in ES cells and differentiated cells. H4K20me3- and H3K4me3-interacting RNAs originate from genes that co-localize with features of active chromatin, including transcriptional machinery and active promoter regions, and the histone modification H3K36me3 in gene body regions. We also found that H4K20me3 and H3K4me3 are associated with distinct gene features including transcripts of greater length and exon number relative to unoccupied transcripts. H4K20me3- and H3K4me3-marked chromatin is also associated with processed RNAs (exon transcripts) relative to unspliced pre-mRNA and ncRNA transcripts. In summary, our results provide evidence that H4K20me3- and H3K4me3-associated RNAs represent a distinct subnetwork of the ES cell transcriptional repertoire.

Regulation of gene expression is a dynamic process that is governed in part by 1) sequence-specific transcription factors and epigenetic regulators binding to regions of accessible chromatin; 2) post-transcriptional splicing events, where RNA binding proteins bind nascent RNA and facilitate splicing and translation (1)(2)(3)(4); and 3) post-translational events, such as modification of histone tails (e.g. lysine methylation) (5-7), which regulates chromatin structure (8) (e.g. packaging of DNA into chromatin) and gene expression by influencing the activity of epigenetic modifiers and the transcriptional state of the underlying DNA sequence. Combined, these processes are important for controlling gene expression networks that promote self-renewal or differentiation of embryonic stem (ES) 2 cells. The advent of technologies to profile RNA-protein interactions has demonstrated that DNA-binding factors associate with RNAs to regulate transcriptional and post-transcriptional processes (9 -11). Although the functional role of RNA in chromatin organization has been a focus of previous work (12,13), recent studies have uncovered interactions between RNA and proteins, including long noncoding RNAs (lncRNAs), which influence chromatin structure by binding and regulating activity of chromatin modifying enzymes (14 -18). Xist is an example of a lncRNA that facilitates chromatin organization by interacting with epigenetic repressors to inactive one X chromosome in female mammalian cells (19). Recent work has also identified RNA binding by proteins associated with chromatin (18), suggesting that a broad set of RNAs may interact with chromatin constituents. Although these studies provide insight into RNA-protein interactions, it is unclear whether RNAs associate with post-translationally modified histones globally. Identifying interactions between RNAs and histone modifications may provide insight into mechanisms of regulation of gene expression in ES cells. Association of RNAs with histone modifications may regulate genome structure and expression patterns by mediating communication networks between the genome and chromatin-associated proteins. Also, interactions between RNAs and chromatin may demarcate euchromatin or heterochromatin domains to facilitate maintenance of self-renewal or differentiation, thus influencing cell fate decisions. For example, chromatin compaction and gene repression may be regulated by heterochromatin-associated transcripts, whereas euchromatin formation and gene activation may be regulated by euchromatin-associated transcripts. In addition, association of RNAs with multiple histone modifications may also regulate gene activity.
Post-translational modification of histones is thought to regulate chromatin structure (20) in part by marking active and inactive gene regulatory networks and by influencing the transcriptional state of the underlying DNA sequence. H4K20 methylation is associated with several cellular processes including transcriptional regulation (21), DNA repair (22,23), DNA replication (24), chromosome condensation (25), and genome stability (23,26). H4K20me1 is enriched at active genes (27,28), whereas H4K20me3 is predominantly associated with hetereochromatic regions. However, recent findings suggest that H4K20me3 co-localizes with H3K4me3 at a subset of regions (29,30), suggesting that H4K20me3 marks transcriptionally dynamic regions in ES cells. In addition, H3K4me3 is enriched at the transcriptional start sites (TSSs) of active genes (28,(31)(32)(33)(34), where it serves in part as a platform for RNA polymerase II (RNAPII) binding and gene activation (7,35,36).
Here, we identified RNAs associated with H4K20me3-and H3K4me3-marked chromatin on a global scale in ES cells by performing chromatin-associated RNA immunoprecipitation (CARIP) followed by next-generation sequencing (CARIP-Seq). Using this approach, we identified common and distinct sets of H4K20me3-and H3K4me3-associated RNAs involved in a diverse set of biological processes. We also found that H4K20me3 and H3K4me3 RNAs co-localize in promoter and gene body regions of active genes with histone modifications such as H3K36me3 and transcriptional machinery. Moreover, H4K20me3 and H3K4me3 are associated with distinct gene features including transcripts of greater length and exon number relative to unoccupied transcripts. H4K20me3 and H3K4me3marked chromatin is also associated with processed RNAs (exon transcripts) relative to unspliced pre-mRNA transcripts. Altogether, our results support a model whereby H4K20me3and H3K4me3-associated RNAs constitute a distinct subnetwork of the ES cell transcriptional repertoire.

Use of CARIP-Seq to identify H4K20me3-and H3K4me3associated RNAs on a global scale in ES cells
Native RNA immunoprecipitations (RNA) previously demonstrated that the histone methyltransferase, Suv420h2, which deposits H4K20me3, interacts with lncRNAs to facilitate recruitment to chromatin and condensation (37). However, it is unclear whether H4K20me3 directly interacts with RNAs. Moreover, whereas H4K20me3 has recently been shown to co-localize with activating histone modifications, such as H3K4me3, at a subset of regions (30), it is unknown whether H3K4me3 also interacts with RNAs. Therefore, to identify RNAs associated with H4K20me3 and H3K4me3, we surveyed H4K20me3-and H3K4me3-RNA interactions on a global scale in ES cells using a newly developed method, which we have termed CARIP-Seq (see "Experimental procedures") (38), by modifying existing ChIP-Seq (39,40) and RIP (41,42) protocols. CARIP-Seq employs a formaldehyde cross-linking step to preserve RNA-protein interactions postlysis in a similar manner as RIP-ChIP (41) and fRIP-Seq (18). However, although RIP-ChIP and fRIP-Seq utilize a lower concentration of formaldehyde (0.1%), CARIP-Seq utilizes a similar concentration of formaldehyde (1%) as ChIP-Seq. To avoid over-cross-linking, which may lead to lower RNA recovery, we reduced the crosslinking time typically used in ChIP-Seq experiments from 10 to 1 min. We hypothesized that a temporal reduction in crosslinking combined with a reduced number of sonication cycles (8 -10) to lyse cells and shear chromatin, relative to ChIP-Seq (14 -18 cycles), would preserve RNA-protein interactions at condensed heterochromatin regions, which are more refractory to fragmentation by sonication relative to open euchromatic regions such as promoters and enhancers. Following sonication, sheared chromatin is incubated with a ChIP-qualified antibody and paramagnetic beads and washed, and RNA is subsequently purified. H4K20me3-and H3K4me3-associated RNAs were immunoprecipitated using the newly developed protocol CARIP using ES cells, followed by construction of cDNAs libraries. CARIP-Seq libraries were then generated using a modified RNA-Seq protocol (see "Experimental procedures") and subjected to next-generation sequencing. To identify RNAs that associate with H4K20me3 and H3K4me3 on a genome-wide scale in ES cells, we performed CARIP-Seq using 4 million ES cells (see "Experimental procedures"). Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) software (43) was used to identify CARIP-enriched islands (FDR Ͻ 0.05; fold change Ͼ 1.5; see "Experimental procedures") relative to input RNA. Using this approach, we identified 6,105 H4K20me3 and 5,863 H3K4me3 CARIP-Seq peaks in the mouse ES cell genome. To test for specific enrichment, we compared H4K20me3 and H3K4me3 CARIP-Seq data to IgG CARIP-Seq data. These results showed that H4K20me3 and H3K4me3 antibodies enriched for bulk RNA relative to IgG control (Fig. S1, A and B). We observed greater enrichment of RNA in H4K20me3 CARIP samples at H4K20me3 CARIP-defined peaks, whereas we observed increased enrichment of RNA in H3K4me3 CARIP samples at H3K4me3 CARIP-defined peaks (Fig. S1, A and B). A direct comparison of RNA-Seq data with H4K20me3 and H3K4me3 CARIP-Seq data demonstrated that H4K20me3 and H3K4me3 CARIP-Seq peaks overlap a subset of expressed transcripts in ES cells (Fig. 1A). The CARIP-Seq signal demonstrates that although the RNA is associated with H4K20me3 or H3K4me3, it does not necessarily indicate that it interacts with the locus shown. This is evident when comparing CARIP-Seq signals to H4K20me3 or H3K4me3 ChIP-Seq peaks (Fig. 1A). Moreover, scatter plots demonstrated that H4K20me3 and H3K4me3 CARIP-Seq samples exhibit distinct signals (Fig. 1B). A comparison with RNA-Seq data revealed that H4K20me3 and H3K4me3 CARIP-Seq samples are enriched at a subset of ESC-expressed genes (Fig.  1C). Annotation of H4K20me3 and H3K4me3 CARIP-Seq enriched islands using HOMER software (44) revealed that

H4K20me3-and H3K4me3-associated RNAs in ES cells
H4K20me3 and H3K4me3 CARIP peaks are predominantly enriched nearby intronic/gene body regions (61 and 67%) (Ϯ2 kb of TSS) and exonic regions (10 and 5%), relative to random genomic sequences of comparable size and frequency, which were used as controls. Moreover, although H4K20me3 and H3K4me3 CARIP peaks are also located in intergenic regions (18 and 19%), their enrichment in these regions is lower than random genomic sequences (Fig. 1D). A region to gene association analysis also showed that H4K20me3 and H3K4me3 CARIP-Seq peaks are largely enriched in gene body regions (downstream of TSS regions) (Fig. 1E). These findings indicate that H4K20me3 and H3K4me3 CARIP-Seq signals originate mainly from genic regions. We also evaluated the relative enrichment of activating (H3K4me3 and H3K36me3) and repressive (H4K20me3 and H3K9me3) histone modifications and transcriptional machinery constituents at genes whose transcripts interact with H4K20me3 or H3K4me3, referred to hereafter as H4K20me3 and H3K4me3 CARIP-Seq peaks. H3K4me3 and RNAPII are enriched at promoter regions of actively transcribed genes in ES cells (45), H3K36me3 and RNAPII Ser2P (46) are enriched in gene body regions of actively transcribed genes, and H4K20me3 and H3K9me3 are enriched at inactive genes or nongenic regions (47). An evaluation of signals at H4K20me3 or H3K4me3 CARIP-Seq peaks showed that densities of features enriched in gene body regions, including H3K36me3 and RNA-PII Ser2P, were higher relative to densities of histone modifications enriched at heterochromatin regions (H4K20me3 and H3K9me3) (Fig. 1F). Moreover, densities of features enriched at promoter regions, including H3K4me3 and RNAPII, exhibited lower densities relative to gene body features, but elevated levels relative to heterochromatin features (Fig. 1F). In contrast, an evaluation of densities at H4K20me3 ChIP-Seq peaks revealed decreased levels of H4K20me3 and H3K4me3 CARIP-Seq signals, H3K36me3, RNAPII, RNAPII Ser2P, and H3K4me3 ChIP-Seq levels relative to H4K20me3 and H3K9me3 ChIP-Seq levels (Fig. 1G, left panel). In addition, analysis of densities at H3K4me3 ChIP-Seq peaks revealed similar levels of H4K20me3 and H3K4me3 CARIP-Seq signals relative to H3K36me3 and RNAPII Ser2P but lower levels relative to H3K4me3 and RNAPII (Fig. 1G, right panel). Moreover, heat maps of H4K20me3 and H3K4me3 CARIP-Seq densities at TSSs to transcription end sites (TESs) also revealed greater enrichment of H3K4me3 CARIP signals relative to H4K20me3 CARIP signals in gene body regions of active genes (Fig. 1H). To evaluate the expression state of H4K20me3-and H3K4me3-associated RNAs in ES cells, we compared these transcripts with RNA-Seq transcriptome data from ES cells and embryoid body (EB) differentiated cells (48) using gene set enrichment analysis (49). Results from these analyses demonstrate that expression of H3K4me3-associated RNAs is enriched in ES cells relative to EB differentiated cells (Fig. 1I). These results also show that although expression of H4K20me3-associated RNAs is mainly enriched in ES cells, some are enriched in differentiated cells (Fig. 1I). To further investigate the distribution of expression of H4K20me3-and H3K4me3-associated transcripts, we evaluated their expression state relative to all genes and ES cell-expressed genes (RPKM Ͼ 1). Although these results confirm that most H4K20me3-and H3K4me3-associated transcripts are expressed in ES cells, these results also demonstrate that H4K20me3-and H3K4me3-associated transcripts exhibit a similar distribution of expression relative to all ES cell-expressed genes (Fig. 1J). In addition, these results demonstrate that CARIP-Seq captures a dynamic range of abundantly expressed transcripts and RNAs with a lower expression level.
We also evaluated the breadth of H4K20me3 and H3K4me3 CARIP-Seq peaks relative to other histone modifications, including H4K20me3 and H3K36me3 and elongating RNAPII Ser2P. These results reveal that the distribution of breadth is overall similar between H4K20me3 and H3K4me3 CARIP-Seq peaks and the H4K20me3 and H3K36me3 histone modifications and transcriptional machinery component, RNAPII Ser2P (Fig. 1K).

Transcriptome analysis of H4K20me3-and H3K4me3-associated RNAs
To determine whether H4K20me3-and H3K4me3-associated RNAs represent a distinct subset of the ES cell transcriptional repertoire that is expressed in pluripotent ES cells or lineage-specific cells, we compared the RNA-Seq transcriptomes of undifferentiated ES cells, epiblast stem cells (EpiSCs), trophoblast (TS) stem cells, EB differentiated ES cells, differentiated trophoblast cells, and mouse embryonic fibroblasts (MEFs) with H4K20me3-and H3K4me3-associated RNAs. ES cells, EpiSCs, and TS cells can be derived from blastocyst-stage embryos. ES cells are fully pluripotent stem cells, EpiSCs express OCT4 and retain pluripotent characteristics, and TS cells represent a multipotent stem cell. Hierarchical clustering analysis (HCA) followed by K-means clustering (KMC) to further refine major patterns of gene expression variability demonstrated that although overall H4K20me3 and

Regulation of cell migration
Epithelial cell diff.

H4K20me3-and H3K4me3-associated RNAs in ES cells
transcriptional circuitry (OCT4/POU5F1, NANOG, and SOX2) ( Fig. 2A, right panels). Next, we identified transcripts that are differentially associated with H4K20me3 or H3K4me3 (RPKM Ͼ 1, fold change Ͼ 1.5) and clustered these genes using HCA (Fig. 2B). DAVID (50) was used to functionally annotate transcripts associated with H4K20me3 and H3K4me3. Using this approach, we identified gene ontology (GO) terms enriched in H4K20me3-associated transcripts such as cell redox homeostasis, regulation of cell motility/migration, placental development, and epithelial cell differentiation (Fig. 2C, top panel) and GO terms enriched in H3K4me3-associated transcripts such as negative regulation of skeletal muscle differentiation, metabolic processes (e.g. macromolecular, ncRNA), and RNA processing (Fig. 2C, bottom  panel).
GREAT gene ontology software was used to functionally annotate transcripts associated with H4K20me3 only and with H3K4me3 only. Results from these analyses reveal enrichment of multiple GO terms, including biological processes enriched in H3K4me3-associated transcripts such as chromosome organization, chromatin modification, histone modification, ncRNA processing, and DNA repair, whereas H4K20me3associated transcripts were enriched with biological process GO terms such as RNA processing, noncoding RNA (ncRNA) metabolic processing, DNA repair, and histone modification. In addition, molecular function GO terms enriched in H3K4me3-associated transcripts include Ran GTPase binding, histone binding, and demethylase activity (Fig. 2D, top right panel), whereas H4K20me3-associated transcripts were enriched with GO terms such as histone acetyltransferase activity, repressing transcription factor binding (Fig. 2D, bottom right panel).

Relationship between H4K20me3-and H3K4me3-associated transcripts and local chromatin
Next, we investigated the overlap between H4K20me3 or H3K4me3 CARIP-Seq peaks and transcription factor (TF) or epigenetic regulator binding, or occupancy of histone modifications. To this end, we evaluated the percentage of overlap between H4K20me3 or H3K4me3 CARIP-Seq peaks and random genomic sequences of comparable size and frequency, which were used as controls, with a compendium of TF and epigenetic regulator (e.g. histone methyltransferase and histone demethylase) and histone modification ChIP-Seq peaks. Some of the features are positively correlated with promoter or gene body regions, respectively. Our results demonstrate that H4K20me3 and H3K4me3 CARIP-Seq peaks highly overlap binding sites of components of the transcriptional machinery such as RNAPII, initiated RNAPII (Ser5P), elongating RNAPII (Ser2P), E2f1, H3K36me3, histone modifications enriched in promoter (H3K4me3, H3K4me2) and enhancer (H3K27ac) regions, histone-modifying enzymes including KDM5B, and multiple transcription factors including ESRRB, KLF4, MYC, STAT3, and OCT4 (Fig. 3A). However, we observed a greater overlap of ZFX, NANOG, N-MYC, MYC, and KLF4 binding sites with H3K4me3 CARIP-Seq peaks relative to H4K20me3 CARIP-Seq peaks (Fig. 3A). These results further suggest that H3K4me3-associated transcripts are aligned with transcriptionally active features in ES cells.
We then evaluated whether H4K20me3 or H3K4me3 exhibits sequence composition binding preference in transcripts. To this end, we searched for motifs enriched in transcript sequences that associate with H4K20me3 or H3K4me3 using MEME-ChIP (51). Using this approach, we discovered several statistically significant motifs in a transcript-wide search, including motifs resembling STAT3 and MYC DNA-binding motifs in H4K20me3-associated transcripts, and sequences resembling NANOG DNA-binding motifs in H3K4me3-associated transcripts (Fig. 3, F and G) enriched in transcript sequences that associate with H4K20me3 and H3K4me3.
The enrichment of H3K4me3, H3K36me3, and RNAPII-Ser2P in promoter and regions spanning gene bodies at genes where H4K20me3-and H3K4me3-associated RNAs are transcribed is exemplified at the ES cell enriched gene, Zfp42 (Fig.  4A). In this case, H3K4me3 is enriched nearby the CARIP-Seq peaks, suggesting that the H3K4me3-Zfp42-associated tran- ChIP-X genes evaluated using Network2canvas annotation of KMC clusters demonstrates that pluripotency and epigenetic regulators are associated with H4K20me3 and H3K4me3 CARIP-Seq peaks. Each node (square) represents a gene list (e.g. H4K20me3 or H3K4me3 associated RNAs associated with a ChIP-X gene-set library). The brightness (white) of each node is determined by its p value. B, HCA analysis of differentially enriched RNAs between H4K20me3 and H3K4me3 CARIP data. Green line, RNAs associated with H4K20me3; red line, RNAs associated with H3K4me3. C, GO functional annotation of H4K20me3-associated and H3K4me3-associated RNAs analyzed using DAVID. D, GREAT (73) GO analysis of H4K20me3-specific and H3K4me3-specific CARIP-Seq peaks. Biological process, molecular function, and mouse phenotype are shown.

H4K20me3-and H3K4me3-associated RNAs in ES cells
script may interact directly with the locus shown. However, H4K20me3 is enriched downstream of the CARIP-Seq peaks, suggesting that the H4K20me3-Zfp42-associated transcript does not interact with the locus shown. At another locus, H4K20me3 and H3K4me3 ChIP-Seq signals overlap with H4K20me3 and H3K4me3 CARIP-Seq signals in promoter and gene body regions (Fig. 4B). In addition, we observed H4K20me3-only (Fig. 4C) and H3K4me3-only associated transcripts (Fig. 4D)

H4K20me3-and H3K4me3-associated RNAs in ES cells
features including promoters, exons, introns, and gene body regions.

H4K20me3 and H3K4me3 associate with various features of transcripts
An abundance of recent studies have suggested that chromatin and co-transcriptional splicing are interconnected (46,(52)(53)(54)(55)(56)(57)(58). To investigate whether H4K20me3 or H3K4me3 has a binding preference for certain structural features of transcripts, we computed the density of H4K20me3 and H3K4me3 CARIP-Seq signals in exon, intron, and gene body regions. Specifically, we compared the density of H4K20me3 and H3K4me3 CARIP-Seq signals at exon versus intron regions (Fig. 5A), exon versus gene body regions (Fig. 5B), and intron versus gene body regions (Fig. 5C). These comparisons revealed that both H4K20me3 and H3K4me3 CARIP-Seq signals are greater in exons relative to introns, higher in exons relative to gene body regions, and higher in gene body regions relative to introns.
We next investigated properties of RNA that determine H4K20me3 and H3K4me3 binding. Previous work demonstrated that histone-modifying enzymes, such as EZH2 and HDAC1, exhibit greater affinity for longer transcripts (18,59,60), whereas other chromatin constituents such as DNMT1, CHD4, and CTCF bind to shorter genes (18). To evaluate the preference for length of transcript binding by H4K20me3, we compared the length of gene isoforms with or without H4K20me3 binding. We found that H4K20me3 and H3K4me3 exhibit a binding preference for longer genes (Fig. 5D). We further investigated a correlation between expression level and transcript length for all genes in the mouse genome, genes expressed in ES cells, and H4K20me3-and H3K4me3-associated transcripts. The results from these analyses reveal that ESC-enriched genes with a longer length were expressed at a slightly lower level than genes with a shorter length (Fig. S1C)

H4K20me3-and H3K4me3-associated RNAs in ES cells
associated transcripts, where, in general, H4K20me3-and H3K4me3-associated transcripts with a longer length were expressed at a slightly lower level than transcripts with a shorter length (Fig. S1C). Because longer genes usually have a greater number of exons, we next asked whether the RNA length preference for H4K20me3 and H3K4me3 associations is due to a greater number of exons in the occupied transcripts. These results demonstrated that H4K20me3 and H3K4me3 preferentially associate with transcripts with a greater number of exons relative to transcripts that do not associate with H4K20me3 or H3K4me3 (Fig. 5E).
To distinguish the roles for exon number and transcript length in H4K20me3-and H3K4me3-RNA binding preference, we performed a regression for transcript length and H4K20me3 or H3K4me3 CARIP-Seq densities for genes with varying numbers of exons. For genes with a greater number of exons, we observed a positive correlation between H4K20me3and H3K4me3-RNA-binding signals and transcript length (Fig. 5F). In contrast, for genes with few exons, transcript length and HK40me3 and H3K4me3-RNA-binding signals were negatively correlated (Fig. 5F). Overall, these results suggest that binding of H4K20me3 and H3K4me3 to RNA is influenced by the structural properties of RNA such as exon number and transcript length. Although our results are in agreement with previous results, which demonstrate that the PRC2 subunit EZH2 exhibits greater binding to longer transcripts, H4K20me3 and H3K4me3 also bind to shorter RNAs with fewer numbers of exons.

H4K20me3 and H3K4me3 associate with noncoding RNAs
Previous work has shown that lncRNAs interact with the H4K20me3 methyltransferase, Suv420h2, where lncRNAs recruit Suv420h2 to chromatin (37). In addition, the H3K4 methyltransferase MLL1 has been shown to interact with ncRNAs (61), and WDR5, a component of the complex that mediates H3K4me3, has been shown to interact with the lncRNA HOTTIP (62). Although our results described above demonstrate that H4K20me3 and H3K4me3 interact with various transcripts in ES cells, to investigate whether H4K20me3 and H3K4me3 binding preference for certain structural features of transcripts extends to ncRNAs, we computed the density of H4K20me3 and H3K4me3 CARIP-Seq signals in exon, intron, and gene body regions as described above. To this end, we evaluated the densities of H4K20me3 and H3K4me3 CARIP-Seq signals at exon versus introns (Fig. 6A), exon versus gene body regions (Fig. 6B), and intron versus gene body regions (Fig. 6C) of ncRNAs. In a similar manner as described above for all genes, H4K20me3 and H3K4me3 CARIP-Seq signals are greater in exons relative to introns, similar between exons and gene body regions, and similar between introns and gene body regions of ncRNAs.
In addition, we investigated properties of ncRNAs that determine H4K20me3 and H3K4me3 binding. We evaluated the preference for length of transcript binding by H4K20me3 or H3K4me3 by comparing the length of ncRNAs with or without an H4K20me3 or H3K4me3 association. Using this approach, we discovered that H4K20me3 and H3K4me3 have a preference for binding ncRNAs of longer length (Fig. 6D). We also investigated whether ncRNA length preferences for H4K20me3 or H3K4me3 associations is related to a greater number of exons in the bound ncRNAs. Our results show that H4K20me3 and H3K4me3 preferentially associated with ncRNAs with a greater number of exons relative to unoccupied ncRNAs (Fig. 6E).
We also evaluated the roles for exon number and transcript length in determining H4K20me3-or H3K4me3-ncRNA binding preferences. To this end, we performed a regression for transcript length and H4K20me3 and H3K4me3 CARIP-Seq densities for ncRNAs with varying numbers of exons. In contrast to all H4K20me3-RNA associations, for genes with a low or high number of exons, we observed a positive correlation between H4K20me3-ncRNA-binding signals and transcript length (Fig. 6F, left panel). In addition, for H3K4me3-RNA associations, for genes with a low number of exons, we observed a negative correlation between H4K20me3-ncRNA-binding signals and transcript length, whereas for genes with a high number of exons, we observed a positive correlation between H4K20me3-ncRNA-binding signals and transcript length (Fig.  6F, right panel). Enrichment of H4K20me3 and H3K4me3 CARIP-Seq signals in exon and intron regions of noncoding genes is exemplified at the lncRNA Xist (Fig. 6G). Altogether, these results further suggest that associations between H4K20me3 or H3K4me3 and RNA are affected by structural properties of RNAs including exon number and transcript length.

Discussion
Here, we have utilized a novel protocol, CARIP-Seq, to identify RNAs associated with the histone modifications, H4K20me3 and H3K4me3, on a global scale in ES cells. CARIP-Seq is a useful method to interrogate genome-wide association of RNAs with chromatin constituents. Using this approach, we identified a diverse set of mRNAs and ncRNAs that associate with H4K20me3 and H3K4me3. These results describing thousands of RNA co-factors for H4K20me3 and H3K4me3 may represent a feature of heterochromatin and euchromatin regulation in ES cells. H4K20me3-and H3K4me3-marked chromatin enriches for exon isoforms relative to unspliced pre-mRNA isoforms. A-C, the scatter plots shows the gene's RPKM contribution to exon versus unspliced pre-mRNA intron isoforms (A), exon versus unspliced pre-mRNA isoforms (gene body; TSS to TES) (B), and unspliced intron isoforms versus full-length unspliced pre-mRNA isoforms (gene body; TSS to TES) (H4K20me3 or H3K4me3 CARIP-Seq log2 RPKM at exon, intron, or gene body regions) (C). D, cumulative distribution function for the transcript length of H4K20me3-or H3K4me3-occupied or unoccupied RNAs. Note that H4K20me3-and H3K4me3-marked chromatin is associated with longer RNAs. p value Ͻ 2.2E-16 (Kolmogorov-Smirnov test). E, cumulative distribution function for the number of exons in H4K20me3-or H3K4me3-occupied or unoccupied RNAs. Note that H4K20me3-or H3K4me3-marked chromatin is associated with RNAs with a greater number of exons. p value Ͻ2.2E-16 (Kolmogorov-Smirnov test). F, correlation between transcript length of H4K20me3-or H3K4me3-associated RNAs and density. We regressed density against transcript length for genes with two, four, six, or eight exons. These results demonstrate that the density of H4K20me3-and H3K4me3-associated RNAs correlates with transcript length and exon number.

H4K20me3-and H3K4me3-associated RNAs in ES cells
Our results also demonstrate that H4K20me3-and H3K4me3-associated transcripts originate from genes that colocalize with features of active chromatin in promoter and gene body regions, such as the histone modification, H3K4me3, and transcriptional machinery component RNAPII in promoter regions, and elongating RNAPII (RNAPII Ser2P) and H3K36me3 in gene body regions of actively transcribed genes. We also show that H4K20me3 and H3K4me3 interact with specific features of RNAs, such as exons and introns. In addition, we demonstrate that H4K20me3 and H3K4me3 exhibits a preference for binding to RNA transcripts with a greater number of exons. Previous work revealed that histone-modifying enzymes such as the PRC2 component EZH2 and the histone demethylase HDAC1 bind longer transcripts (18,59,60), whereas other epigenetic regulators such as the DNA methyltransferase DNMT1, the chromatin remodeler CHD4, and the insulator CTCF bind shorter transcripts (18). Our results are in alignment with the binding preference length of the aforementioned histone modifying enzymes, where we found that H4K20me3 and H3K4me3 exhibit a binding preference for longer transcripts. Combined, these findings suggest that the analyzed histone-modifying enzymes and the H4K20me3 and H3K4me3 histone modification have a binding preference for longer RNA transcripts.
Although the vast majority of previously described RNA and chromatin-associated protein interactions involve ncRNAs, strikingly, we identified more mRNAs associated with H4K20me3 and H3K4me3 and with greater enrichment relative to ncRNAs, implicating a potentially novel role for mRNAs as regulators of the genome. We also discovered RNA binding preferences for H4K20me3 and H3K4me3 using a genomewide transcript analysis and several motifs enriched in H4K20me3-or H3K4me3-associated transcripts. For example, we identified motifs resembling STAT3 and MYC DNA-binding motifs in H4K20me3-associated transcripts and sequences resembling NANOG DNA-binding motifs in H3K4me3-associated transcripts.
Although H4K20me3-marked chromatin is thought to be a repressive histone modification, where it is enriched at pericentric hetereochromatin and is involved in repressing transcription of repetitive DNA elements (23,63,64), our data overall suggest that H4K20me3 interacts with many transcripts from loci that are actively transcribed in ES cells. Although H4K20me3 is mainly associated with heterochromatin regions, a recent study demonstrated that a subset of H4K20me3marked regions contain H3K4me3 (30). These previous findings demonstrating that H4K20me3 marks transcriptionally dynamic regions in ES cells is in alignment with results from this study, which show that expression of H4K20me3-associated RNAs is predominantly enriched in undifferentiated ES cells relative to differentiated cells (Fig. 1I). H4K20me3-and H3K4me3-associated RNAs may serve as regulators of genome structure and gene expression by facilitating communication between the genome and chromatin-associated proteins. Interactions between H4K20me3 or H3K4me3 and pluripotency or lineage-specific transcripts may also facilitate distinct heterochromatin or euchromatin boundaries during self-renewal and differentiation, and interactions between H4K20me3 or H3K4me3 and ESC-enriched transcripts may also dictate ES cell fate decisions by supporting a positive feedback loop by communicating external LIF signals and an intrinsic OCT4centric transcriptional network to the H4K20me3-marked genome. It is also possible that the RNA-chromatin interactions described in this study occur transiently before the RNAs are translated or that some RNAs associate with chromatin on a more permanent basis and avoid translation, which has previously been suggested (18). However, additional studies are needed to determine the role for chromatin-associated RNAs in regulating chromatin. Moreover, although it is possible that some histone modification-RNA interactions captured using CARIP-Seq may represent co-localization of nascent RNAs with chromatin containing specific histone modifications, because we observed enrichment of CARIP-Seq signals throughout the gene bodies of many genes (Fig. 1H) and not increased enrichment near the 5Ј region of genes (TSS), these results argue against indirect pulldown of nascent RNAs as the primary explanation for these interactions. It is also unlikely that CARIP-Seq signals represent association of H4K20me3 RNA with non-chromatin-bound histones, because the sonication protocol used for CARIP-Seq would likely result in oversonication of nonchromatin-bound RNAs, because of increased sensitivity of naked DNA or RNA to shearing. Our results show that RNA fragments used for CARIP-Seq are 500 -700 bp (Fig. S1D), a size that is unlikely to be achieved following sonication of nonchromatin-bound RNAs, thus arguing against the possibility of CARIP-Seq signals representing association of H4K20me3 RNA with non-chromatin-bound histones.
In addition, H4K20me3-mediated heterochromatin formation and gene silencing may occur in the presence of H4K20me3-associated transcripts. Along this line, previous work showed that a lncRNA-mediated mechanism guides the H4K20me3-methyltransferase, Suv420h2, to specific loci to facilitate chromatin compaction (37). Although these results show that interaction between a lncRNA and Suv420h2 triggers H4K20me3 deposition, it is unclear from these results whether H4K20me3 interacts with RNAs to regulate heterochromatin formation. Additional studies will be required to investigate the role for H4K20me3-associated RNAs in these processes.

Conclusions
In summary, we describe the application of a novel method, CARIP-Seq, to identify H4K20me3 and H3K4me3 associated transcripts on a genome-wide scale in ES cells. These results provide novel insight into the epigenetic and transcriptional repertoires of ES cells.

ES cell culture
ES cells were cultured as previously described with minor modifications (45,46,48). Briefly, R1 ES cells were cultured on irradiated MEFs in DMEM, 15% FBS medium containing Leukemia inhibitory factor (LIF) (ESGRO) at 37°C with 5% CO 2 . For CARIP-Seq experiments, ES cells were cultured on gelatincoated dishes in ES cell medium containing 1.5 M CHIR99021 (GSK3 inhibitor) for several passages to remove feeder cells. ES cells were passed by washing with PBS (Gibco), and dissociating

CARIP-Seq analysis
CARIP-Seq experiments were performed by modifying ChIP-Seq (39,40,45,48) and RNA-Seq experiments. ES cells were harvested by washing with room temperature PBS and dissociating using 0.25% trypsin. Following centrifugation, ES cells were resuspended in prewarmed medium (15% FBS (Millipore) and high-glucose DMEM with glutamine (Gibco)) at a concentration of 2 million cells/ml. To cross-link RNA and chromatin, formaldehyde (Sigma) was added to a final concentration of 1%, and the cells were incubated for 1 min at 37°C in a water bath. Cross-linking was quenched by addition of chilled glycine (Sigma) to a final concentration of 125 mM, and rotated at room temperature for 5 min. The cells were pelleted by centrifuging at 1500 ϫ g for 5 min, the medium/cross-linking solution was aspirated, and the pellet was washed with ice-cold PBS. The cells were washed a second time with PBS. Fixed cell pellets were flash frozen using liquid nitrogen and stored at Ϫ80°C.
The monoclonal H3K4me3 antibody (Millipore, 17-614) and the H4K20me3 antibody (Millipore, 07-463) were obtained from EMD Millipore. Sonicated cell extracts equivalent to 5 million cells were used for CARIP assays. To bind antibody to paramagnet beads, 40 l of protein G paramagnetic Dynabeads (Invitrogen) were first washed with PBS and incubated in PBS with 4 g of H4K20me3 antibody or 4 g of H3K4me3 antibody at room temperature for 40 min while rotating. The beads were washed twice with PBS, and sonicated chromatin extract was added to the beads and incubated overnight at 4°C while rotating to immunoprecipitate H4K20me3-and H3K4me3-associated factors and RNA. The next day, the beads were washed twice with radioimmune precipitation assay buffer (TE, pH 8.0, 1% Triton X-100, 0.1% sodium deocycholate, 0.1% SDS), twice with radioimmune precipitation assay buffer with 0.3 M NaCl (Sigma), twice with LiCl buffer (0.5% sodium deocycholate, 0.5% NP40 (Sigma), 0.25 M LiCl (Sigma), once with TE and 0.2% Triton X-100 (Sigma), and once with TE. The samples were then resuspended in TE buffer. To de-cross-link, the samples were incubated for 4 h at 65°C in TE buffer (pH 8.0) with 1% SDS and 5 l of proteinase K (20 mg/ml; Roche). RNA was extract using QIAzol (Qiagen) following the manufacturer's instructions and eluted in nuclease-free water.
Double-stranded cDNA was generated using a Super-Script double-stranded cDNA synthesis kit (Invitrogen) following the manufacturer's instructions. cDNA was fragmented using a water bath sonifier (Diagenode Bioruptor Pico) and subjected to library preparation as described previously (29,(45)(46)(47)(48). Briefly, CARIP-enriched cDNA was end-repaired using the End-It DNA end-repair kit (Epicenter), followed by addition of a single A nucleotide and ligation of custom Illumina adapters, as previously described (29,40,48). PCR was performed using Phusion High Fidelity PCR master mix. CARIP libraries were sequenced on Illumina HiSeq platforms according to the manufacturer's protocol. Sequence reads were mapped to the mouse genome (mm9) using bowtie2 (65) with default settings.
CARIP-Seq read enriched regions (peaks) were identified using SICER software (43) with a window size setting of 200 bp, a gap setting of 400 bp, and a FDR setting of 0.001. The SICER compare function was used to compare multiple samples (FDR Ͻ 0.001, fold change Ͼ 1.5). The read per base per million reads measure was used to quantify the density at genomic regions from ChIP-Seq data sets. We have also applied the Kolmogorov-Smirnov test to obtain p value statistics and compare densities at genomic regions. The RPKM measure (66) was also used to quantify RNA levels of a gene from CARIP-Seq data.

RNA-Seq analysis
Poly(A) mRNA was purified using a Dynabeads mRNA purification kit. Double-stranded cDNA was generated using a Super-Script double-stranded cDNA synthesis kit (Invitrogen). cDNA was subjected to library preparation as described above. RNA-Seq libraries were sequenced on an Illumina HiSeq platform according to the manufacturer's protocol. The RPKM measure (66) was used to quantify the mRNA expression level of a gene from RNA-Seq data. Differentially expressed genes were identified using edgeR (FDR Ͻ 0.001; fold change Ͼ 1.5; RPKM Ͼ 3) (67).

ChIP-Seq analysis
ChIP-Seq experiments were performed as previously described with minor modifications (29,39,40,45,47,48). The H3K36me3 antibody (ab9050) was obtained from Millipore. Briefly, 15e6 mouse ES cells (R1) were harvested and chemically cross-linked with 1% formaldehyde (Sigma) for 5-10 min at 37°C and subsequently sonicated. Sonicated cell extracts equivalent to 5e6 cells were used for ChIP assays. ChIP-enriched DNA was end-repaired using the End-It DNA end-repair kit (Epicenter), followed by addition of a single A nucleotide, and ligation of custom Illumina adapters. PCR was performed using Phusion high-fidelity PCR master mix. ChIP libraries were sequenced on Illumina HiSeq platforms according to the manufacturer's protocol. Sequence reads were mapped to the mouse genome (mm9) using bowtie2 (65) with default settings. ChIP-Seq read enriched regions (peaks) were identified relative to control input DNA using SICER software (43) with a window size setting of 200 bp, a gap setting of 400 bp, and a FDR setting of 0.001. The read per base per million reads

H4K20me3-and H3K4me3-associated RNAs in ES cells
measure was used to quantify the density at genomic regions from ChIP-Seq data sets. Additional ChIP-Seq data sets were also analyzed in this manuscript, including RNAPII and RNA-PII Ser2P (GSE94739), H4K20me3 and H3K4me3 (GSE94086), and H3K4me3 (GSE53087) ChIP-Seq data.