Paradoxical Role of DNA Methylation in Activation of FoxA2 Gene Expression during Endoderm Development*

Background: The transcription factor FoxA2 is a key mediator of endoderm development and pancreas gene expression. Results: DNA methylation is paradoxically associated with active expression of the FoxA2 gene. Conclusion: Developmental control genes such as FoxA2 may be regulated by a novel genetic mechanism. Significance: Clarification of this mechanism may provide a better understanding of how developmental pathways are activated. The transcription factor FoxA2 is a master regulator of endoderm development and pancreatic beta cell gene expression. To elucidate the mechanisms underlying the activation of the FoxA2 gene during differentiation, we have compared the epigenetic status of undifferentiated human embryonic stem cells (hESCs), hESC-derived early endoderm stage cells (CXCR4+ cells), and pancreatic islet cells. Unexpectedly, a CpG island in the promoter region of the FoxA2 gene displayed paradoxically high levels of DNA methylation in expressing tissues (CXCR4+, islets) and low levels in nonexpressing tissues. This CpG island region was found to repress reporter gene expression and bind the Polycomb group protein SUZ12 and the DNA methyltransferase (DNMT)3b preferentially in undifferentiated hESCs as compared with CXCR4+ or islets cells. Consistent with this, activation of FoxA2 gene expression, but not CXCR4 or SOX17, was strongly inhibited by 5-aza-2′-deoxycytidine and by knockdown of DNMT3b. We hypothesize that in nonexpressing tissues, the lack of DNA methylation allows the binding of DNA methyltransferases and repressing proteins, such as Polycomb group proteins; upon differentiation, DNMT activation leads to CpG island methylation, causing loss of repressor protein binding. These results suggest a novel and unexpected role for DNA methylation in the activation of FoxA2 gene expression during differentiation.

Development of multicellular organisms involves progression from pluripotency through multipotency, leading ultimately to terminal differentiation. This process is mediated by transcription factors and epigenetic modifications that orchestrate the accompanying changes in gene expression; in turn this creates heritable cellular memories characteristic of both specific lineages and cell types (1,2). Human embryonic stem cells (hESCs) 2 are a valuable tool for studying differentiation and development (3), because they possess the potential to differentiate into cells of all three major germ layers, thus permitting access to cell populations corresponding to early stages of development. This provides an opportunity to define the mechanisms controlling establishment of the primary germ cell layers. This is valuable for expanding our understanding of basic developmental biology, as well as for establishing a practical source of desired cell populations for cell replacement therapy.
Upon differentiation of stem cells, pluripotency is gradually lost as cells become lineage-specified and ultimately acquire cell-specific functions. The choice between pluripotency or differentiation depends on a specific set of genes, primarily transcription regulators. The classic "pluripotency factors" Oct4, Nanog, and SOX2 act to maintain the pluripotent state and to block differentiation. On the other hand, "master regulators" of differentiation act in the opposite fashion to promote differentiation (4).
Genes of the FoxA family are expressed at an early stage of mouse development during the formation of definitive endoderm (5)(6)(7)(8). The first to be activated is FoxA2, initially in the anterior primitive streak and the node at embryonic day 6.5 in the mouse (6,9). FoxA2 (together with family member FoxA1) is required for proper development of endoderm-derived organs such as liver, pancreas, lung, and prostate and is thus considered a master regulator of early endoderm formation and endoderm lineage establishment (10). FoxA2 has been described as a "pioneer factor" that binds to the chromatin of a progenitor cell prior to target gene activation along with factors that help to modify the chromatin upon gene activation during cellular differentiation (10 -12). FoxA2 is also expressed later in development, in endoderm-derived tissues, including liver, lung, stomach, pancreas, small intestine, and colon (13). In pancreatic islets, FoxA2 is a direct activator of Pdx1 (14), which places it near the top of the pancreatic transcription factor hierarchy. Currently, little information is available regarding the precise molecular mechanisms that control the activation of this gene during different stages of development (15).
In this study, we report that a CpG island in the promoter region of the FoxA2 gene paradoxically shows high levels of DNA methylation in endoderm lineage FoxA2-expressing tissues and low levels in nonexpressing tissues. Using reporter gene assay, this CpG island region was found to inhibit reporter gene expression, an effect that was abolished by in vitro methylation of the inhibitory fragment. Furthermore, using a ChIP assay, we observed that this region is bound by the Polycomb group protein SUZ12 and the DNA methyltransferase DNMT3b in hESCs, where this region is unmethylated; this binding is accompanied by high levels of H3K27me3. Furthermore, inhibition of methylation by 5-aza-2Ј-deoxycytidine (5-aza-dC) or directed DNMT3b knockdown led to decreased activation of FoxA2 during stem cell differentiation toward endoderm progenitors. These results indicate that methylation of the CpG island plays a key role in FoxA2 gene activation in both early and late stages of FoxA2 expression.

Cell Culture
The following established cell lines were used in this study: 293T (human embryonic kidney cells) and MIN6 (mouse ␤ cells) (16). 293T cells were grown in DMEM supplemented with 10% FCS and penicillin (200 IU/ml) with streptomycin (100 g/ml). MIN6 cells were grown on Falcon tissue culture plates in DMEM supplemented with 15% FCS, penicillin/streptomycin, 2 mM L-glutamine, 5.6 mM glucose, and 0.5% ␤-mercaptoethanol. All experiments using MIN6 cells were performed with passages not higher than 33 (17).

Flow Cytometry
Cells were dissociated using enzyme-free Hanks'-based cell dissociation buffer (Invitrogen, 13150-016) for 15 min, followed by quenching with 10% FCS in PBS. Staining of cells was carried out in PBS containing 3% FCS using the following antibody (from BD Biosciences): phycoerythrin mouse anti-human CD184 (CXCR4, catalog number 555974). Propidium iodide (Biotium 40016) (2 g/ml) was used to mark dead cells. Suspended cells were filtered through a 40-m nylon strainer (BD Falcon) and analyzed/sorted by FACSAria (BD).

Mini Chromatin Immunoprecipitation (miniChIP)
Preparation of Soluble Chromatin-Between 10 5 and 10 6 cells were resuspended in 1 ml of serum-containing medium and fixed by the addition of formaldehyde (1% final concentration) for 10 min at room temperature with gentle mixing. Fixation was stopped by adding freshly prepared glycine (0.125 M) and incubating for 5 min at room temperature. Next, the cells were spun down (8000 rpm 4°C) and washed once (4000 rpm 4°C) in cold PBS. Cells were either fast frozen on dry ice and stored at Ϫ70°C or lysed in 1 ml of lysis buffer (25 mM Tris-HCl, pH 8.1, 150 mM NaCl, 1% Nonidet P-40, 0.5% SDS, 1% deoxycholate, 1% Triton X-100) for 10 min on ice, and the lysate was sonicated using a Bioruptor 300 (Diagenode) in an ice bath. To achieve an average fragmentation of 200 -600 bp, 30 -45 sonication cycles of 30 s, each with a 30-s interval between them, were required according to cell type (45 cycles for MIN6 cells and 30 cycles for hESC). Next, the lysed chromatin was centrifuged for 10 min at 13,000 rpm and 4°C, and the supernatant was either stored at Ϫ70°C or used directly for immunoprecipitation.
Immunoprecipitation Reaction-Sonicated chromatin from 10 5 cells was precleared by adding 50 g/ml salmon sperm DNA, 100 g/ml yeast tRNA, 1 g/ml BSA (final concentrations), and 35 l of 50% (w/v) protein A-Sepharose beads (in 150 mM TSE buffer: 2 mM EDTA, 1% Triton X-100, 20 mM Tris, pH 8.1, 0.1% SDS, and 150 mM NaCl) and incubating for 30 min at 4°C on a rotating wheel. In addition, the desired amount of protein A-Sepharose beads was precleared using the same conditions. Following preclearance, the chromatin was centrifuged (13,000 rpm 15 min 4°C), and the supernatant was transferred to a new tube. From 1.1 ml of precleared chromatin, 50 l was set aside as input sample and stored at Ϫ20°C, and 500 l were taken for incubation with the desired antibodies or sera. The chromatin and antibodies were incubated overnight at 4°C on a rotating wheel. On the next day, the extracts were incubated with 35 l of 50% precleared protein A-Sepharose suspension for 2 h at 4°C. Next, the beads were sequentially washed once with TSE-150, three times with TSE-500 (2 mM EDTA, 1% Triton X-100, 20 mM Tris, pH 8.1, 0.1% SDS, and 500 mM NaCl), and two times with TE (10 mM Tris-HCl, pH 8.1, 0.1 mM EDTA). The immune complexes were then eluted by incubating the beads with 100 l of elution buffer (1% SDS, 0.1 M NaHCO 3 ) for 30 min. The eluates and input samples were heated at 65°C overnight to reverse the formaldehyde crosslinks. The eluates were purified using MinElute columns (Qiagen) according to the manufacturer's instructions and eluted in 35 l of water. A 2-l sample was used for amplification by quantitative real time PCR as described (17).

Real Time Quantitative PCR
Total RNA was extracted from cell cultures using the Tri-Reagent procedure (Molecular Research Center Inc.) according to the manufacturer's instructions. RNA from sorted populations was isolated using RNeasy MinElute Cleanup kit (Qiagen 74204). DNA was eliminated using TURBO DNA-free kit (Ambion AM1907). DNase I-treated RNA was reverse transcribed into cDNAs using affinity Script reverse transcriptase (Agilent Technologies) with 10 M random hexamer primers (Roche), as recommended by the manufacturer (Invitrogen). Transcript levels were measured using real time quantitative PCR on a 7300 ABI real time PCR system using Power SYBR green PCR master mix (Applied Biosystems). The levels of each gene were normalized using ␤-actin as an endogenous control mRNA.

Bisulfite Sequencing Analysis
Bisulfite conversion of genomic DNA was done using the EpiTect bisulfite kit (Qiagen 59104) according to the manufacturer's instructions. PCR primers were designed using Methyl Primer Express software v1.0. Specific fragments were amplified by PCR (Roche: FastStart Taq 250U (5 units/l) 12-032-953-001) using 20 ng of DNA following bisulfite treatment and 2.5 l of mixed primers (0.5 M) under the following conditions: 95°C for 5 min; 5 cycles of 95°C for 1 min, 53°C for 3 min, and 72°C for 3 min; 40 cycles of 95°C for 30 s, 55°C for 45 s, and 72°C for 45 s; followed by an additional incubation at 72°C for 7 min. Following PCR amplification, the DNA was extracted from the gel using the RBC gel extraction kit, cloned into the Promega pGEM-T vector system, transfected into competent bacteria that were grown on LBAgar plates containing ampicillin, 0.2 mM X-gal, and 0.1 mM isopropyl ␤-D-thiogalactopyranoside. The DNA from positive clones was isolated using a mini plasmid isolation kit and sequenced using T7 or Sp6 primers. Obtained sequences were aligned with unconverted genomic DNA sequence using the biq-analyzer tool, and DNA methylation levels were calculated. DNA samples were purchased from Biochain Inc.

Transient Transfections
Transfections of 293T or MIN6 cells were carried out using the cationic polymer transfection reagent jetPEI (Polyplus Transfection) according to the manufacturer's instructions. Transfections of hESCs were carried out using the lipid-polymer mixture TransIT-2020 (Mirus) according to the manufacturer's instructions. Briefly, hESCs were grown for 24 h on growth factor-reduced Matrigel. On the day of transfection, 200,000 cells/well in a 24-well plate coated with Matrigel were transfected with a mixture of 0.5 g of DNA (consisting of 100 ng of reporter construct, 50 ng of internal control, and 350 ng of pUC18) and 2 l of TransIT-2020 reagent in optimem medium (according to the manufacturer's protocol; Mirus). Cells were harvested 48 h after transfection, and cell extracts were subjected to protein concentration determination and luciferase reporter enzyme assays as described (19).

Plasmid Constructions
Plasmids for promoter activity measurements were constructed using pGL3-basic vector (Promega). The region upstream from FoxA2 (FoxA2 gene promoter region) was generated by PCR using primers 5Ј-CGGTGGAGTGATGAAGT-TGCTCC-3Ј (top) and 5Ј-GCCGCCTCGGCTCTCCG-3Ј (bottom) with template of human genomic DNA. The PCR fragment was subcloned into CloneJET TM PCR cloning kit (Fermentas) and ligated to pGL3-basic at the BglII site (creating construct ϩCpG2). Fragments from this vector were generated by PCR using the following primers: forward ϪCpG2 5Ј-TCA-CAGGCTAACCCAGAACAGA-3Ј (top) and the same bottom primer, and forward ϪE 5Ј-CATCATTGATTCCTG-GATTCTTC-3Ј (top) and the same bottom primer. The PCR fragments were subcloned into the pGL3 vector using BglII and XhoI sites.
In Vitro Methylation-Plasmids were in vitro methylated using HhaI or HpaII methyl-tranferases (New England Biolabs) as follows: 5-10 g of plasmid were incubated for 1.5 h at 37°C with the specified enzyme (1 unit/1 g of DNA) in the presence of 1ϫ buffer and 0.32 mM S-adenosyl methionine. After the reaction, the DNA was phenol-chloroform-extracted, EtOHprecipitated, and resuspended in water. Methylated DNA was digested with methylation-sensitive restriction enzymes (HhaI or HpaII, respectively; New England Biolabs) and resolved on agarose gel to estimate the levels of methylation. Methylated DNA was used in transient transfection assays.
siRNA Transfection-hESCs were grown for 2 days on growth factor-reduced Matrigel. Cells were transfected with DharmaFECT2 (Dharmacon) transfection reagent mixture, according to the manufacturer's instructions. Briefly, 95 l of 1ϫ siRNA buffer (Dharmacon B-002000-UB-100) were mixed with ON-TARGET PLUS pool siRNA (Dharmacon) (final concentration, 50 nM) and 100 l of DMEM F12 (mix 1). Each siRNA pool contained four different siRNAs targeting DNMT3b. A nontarget siRNA pool was used as control. 4 l of DharmaFECT2 transfection reagent mixture were mixed with 196 l of DMEM F12 (mix 2). Both mixtures were incubated separately for 5 min at room temperature and then combined and incubated for an additional 20 min at room temperature (mix 3). Mix 3 was combined with 2 ϫ 10 6 hESCs in a 6-cm plate coated with Matrigel in the presence of 10 M ROCK inhibitor (Sigma Y-27632). After 48 h, cells were either harvested for RNA and protein purification or permitted to differentiate (20) for 3 days and then harvested for RNA.

RESULTS
DNA Methylation at the FoxA2 Locus-The FoxA2 locus contains three CpG islands: CpG1 and CpG2a are upstream of the minimal promoter, whereas CpG3 encompasses the transcription start site (TSS) and the majority of the transcribed region (Fig. 1A). To determine the methylation state of the FoxA2 gene during development, we performed bisulfite DNA sequencing. Genomic DNA of different cell types was treated with sodium bisulfite, and PCRs were performed to permit analysis of regions within and outside of the three CpG islands. Bisulfite sequencing of CpG 3 (10% of total length Ϫ321 bp of 3368 bp), which covers the FoxA2 TSS, was performed in four cell types: hESCs and human skeletal muscle, in which the FoxA2 gene is inactive; and hESC-derived definitive endoderm (CXCR4ϩ) and human pancreatic islets, in which the FoxA2 gene is active. No significant methylation was observed in any of these cell types (Ͻ2%; data not shown), suggesting that this CpG island does not have a regulatory role in FoxA2 gene expression. By contrast, analysis of CpG2a revealed a paradoxical pattern of methylation: low levels of methylation were observed in tissues that do not express FoxA2 (hESC and skeletal muscle), and high levels were observed in FoxA2-expressing tissues (pancreatic islets, liver, and colon) (Fig. 1B). Furthermore, CXCR4ϩ cells (early definitive endoderm cells in which FoxA2 expression is commencing) also exhibited high levels of methylation (Fig. 1B).
By inspecting the sequence downstream to CpG2a, we observed that the CG-rich region extends ϳ250 bp beyond the core CpG island. We designated this CpG-rich region as CpG2b (Fig. 1A) and determined methylation levels. Again, we observed relatively low levels of DNA methylation in nonexpressing tissues (hESC and skeletal muscle) and higher levels in expressing tissues (CXCR4ϩ cells, islets, liver, and colon) (Fig.  1C). These results indicate that CpG2b (although it is not identified by the computer algorithm used) is differentially methylated by a similar mechanism as CpG2a: this unusual methylation pattern may play a role in activating FoxA2 gene expression.
Consistent with these results, in a genome-wide nonquantitative study (20), CpG2a showed an expression-dependent pattern of differential methylation: FoxA2-expressing tissues (pancreas, liver, and several types of cancer cell) showed high levels of methylation, whereas nonexpressing tissues (including brain, hESC, skeletal muscle, and sperm) showed low levels of methylation. Interestingly, CpG1 does not display the paradoxical methylation pattern seen with CpG2 (20) and hence is probably less important for regulation of the FoxA2 gene.
CpG2 as a Negative Regulatory Element-An effective approach to assess the possible role of CpG2 in regulating FoxA2 gene transcription is a reporter gene assay, because plasmids propagated in Escherichia coli have unmethylated CpGs, which can be efficiently methylated in vitro using the enzymes HhaI and HpaII (21). Upon transfection to mammalian cells, CpG sequences do not become methylated (21). Several DNA fragments derived from the FoxA2 promoter were ligated upstream to the luciferase reporter gene (Fig. 2A). Construct ϩCpG2 contains the entire fragment spanning CpG2 to the TSS. From ϩCpG2, a fragment of 675 bp that contains the entire CpG2 (2aϩb) was deleted, leaving the core promoter and enhancer (22) of the FoxA2 gene (construct ϪCpG2). Construct ϪE contains only the core promoter without the enhancer region. The three constructs were transfected to undifferentiated hESCs, the mouse pancreatic beta cell line MIN6, and 293T cells, and luciferase levels were measured. The presence of the promoter fragment that contains CpG2 (construct ϩCpG2) did not produce a significant increase in activity in any cell type tested, as compared with the pGL3-basic vector (Fig. 2B). On the other hand, deletion of the 675 bp containing the CpG2 (construct ϪCpG2) led to a significant increase in promoter activity in all cell types, but most strikingly in MIN6 cells (Fig. 2B). This indicates 1) that the 675-bp fragment contains a negative regulatory element (note that the CpG island is unmethylated in these constructs, because plasmids are prepared from E. coli) and 2) FIGURE 1. Bisulfite sequencing of the FoxA2 CpG islands. A, the FoxA2 locus contains three CpG islands: CpG1 localized between Ϫ1800 and Ϫ1400 (relative to the TSS), CpG2a/b localized between Ϫ950 and Ϫ375, and CpG3 localized between Ϫ3 and ϩ3400. The region of CpG2a/b that was analyzed by bisulfite sequencing (B and C) is indicated. B, bisulfite sequencing of CpG2a. Genomic DNA from undifferentiated hESC, hESC-derived definitive endoderm FACS-sorted CXCR4ϩ, human skeletal muscle, pancreatic islets, liver, and colon was treated with sodium bisulfite and amplified by PCR. PCR products were sequenced and compared with the genomic unconverted DNA sequence. Empty circles, unmethylated CpGs; filled circles, methylated CpGs. The percentage of methylation is indicated and calculated from total number of CpGs. The region displayed (135 bp long) is a representative portion of the sequenced region. C, bisulfite sequencing of CpG2b, using DNA from tissues indicated in B (above). The percentage of methylation is indicated and calculated from total number of CpGs. The region (145 bp long) displayed is a representative portion of the sequenced region.
that the ϪCpG2 fragment contains a positive regulatory element that is preferentially active in beta cells. Consistent with this, deletion of the enhancer region (construct ϪE) led to a decrease in promoter activity only in MIN6 cells (Fig. 2B).
DNA Methylation Abolishes the Negative Effect of CpG2-To extend this analysis, we tested whether the 675-bp fragment displays a negative effect when removed from the context of the intact promoter and whether this effect is influenced by DNA methylation. For this, we inserted the following three fragments to pGL2-control vector, a luciferase plasmid that contains the SV40 promoter and enhancer: the 675-bp fragment (CpG 2aϩ2b), a fragment containing only CpG2a, and a fragment containing only CpG2b (designated as aϩb, a, and b, respectively in Fig. 2, A and C). These fragments were inserted between the SV40 enhancer and promoter, upstream of the luciferase gene. All plasmids were tested in unmethylated form and following in vitro methylation by HhaI or HpaII methylase. In their unmethylated state, all three fragments produced a 30 -50% decrease in promoter activity relative to control plasmid (Fig. 2D, black bars), indicating that all three fragments contain functional negative regulatory elements. Methylation with HpaII eliminated the negative activity of both the aϩb and a fragments (Fig. 2D, gray bars). Methylation with HhaI reduced the negative activity only in the 2a fragment (Fig. 2D, white bars), perhaps because of the absence of HhaI sites in the CpG2b region (Fig. 2C) These results are consistent with the idea that methylation of CpG2 plays a positive role in regulating FoxA2 gene promoter activity.
Binding Analysis of Repressive Proteins to the CpG Island-DNA methylation typically correlates with gene silencing and can repress gene expression by directly blocking access of transcription regulatory factors to target DNA. In the FoxA2 gene, there is apparently a paradoxical opposite effect of DNA methylation on gene expression. DNA methylation at FoxA2 CpG2 occurs selectively in expressing tissues; methylation at the CpG island appears to activate gene expression, whereas in nonexpressing tissues, a lack of methylation inhibits gene expression. To address the mechanisms involved, we used ChIP assay to examine the binding of regulatory proteins and the histone methylation status of the CpG island region. As control, we examined the binding to the coding sequence (CDs) of the FOXA2 gene, which is not expected to be recognized selectively by transcription factors.
SUZ12 is a member of the Polycomb group protein family and is a component of the complex PRC2 (Polycomb repressive complex) that acts to repress transcription (23). To determine whether SUZ12 binds to CpG2 in vivo, we performed ChIP analysis using hESCs, CXCR4ϩ, and islets cells with anti-SUZ12 antibody. SUZ12 occupancy on the CpG island region was compared with its binding to Pax1 CDs, which is known to be bound selectively by SUZ12 in ESCs (24) and hence serves as a positive control and also to FoxA2 CDs and Pax1 CpG island, that are not expected to bind the SUZ12 protein. In hESCs, SUZ12 binds FoxA2 CpG2 to a similar extent as to the Pax1 CDs (Fig. 3A). Furthermore SUZ12 occupancy in hESC was significantly higher than its occupancy in CXCR4ϩ and islets cells (Fig. 3A), whereas relatively low levels of binding were observed in the FoxA2 CDs and Pax1 CpG island in all of the cells tested.
DNA methylation is typically removed following zygote formation and re-established around the time of implantation; the closely related de novo DNA methyltransferases DNMT3a and DNMT3b are required for this process (25,26). During embryonic development, DNMTs are recruited by PRC2 proteins such as EZH2 and SUZ12 and act to methylate the DNA (27). DNMT3b occupancy on CpG2 was compared with its binding to Pax1 CDs and to FoxA2 CDs and Pax1 CpG island, which are not expected to bind the DNMT3b protein. In hESCs, DNMT3b binds the FoxA2 CpG2 to a similar extent as the Pax1 CDs. DNMT3b occupancy in hESCs was significantly higher than its occupancy in CXCR4ϩ and islet cells (Fig. 3B). A low extent of binding was observed for the FoxA2 CDs and Pax1 CpG island in all cells tested.
DNMT3a occupancy on the FoxA2 CpG2 region was also tested and compared with its binding to the CpG island in the DMRT2 gene, a target of DNMT3a in hESCs (28), and with FoxA2 CDs and Pax1 CpG island, which are not expected to bind the DNMT3a protein (Fig. 3C). In hESCs, DNMT3a occupancy on the DMRT2 CpG island was significantly higher than its occupancy in CXCR4ϩ and islet cells. On the other hand, a low level of binding was observed in FoxA2 CpG2, CDs, and Pax1 CpG island in all cells tested. We therefore conclude that in hESCs, DNMT3b, but not DNMT3a binds preferentially to FoxA2 CpG2.
Polycomb group proteins mediate di-and trimethylation of histone H3K27 (H3K27me2 and H3K27me3) (23), a stable epigenetic mark associated with gene silencing. The levels of H3K27me3 were examined by ChIP in hESCs, CXCR4ϩ, and islet cells using a specific antibody (Fig. 3D). In accordance with gene expression patterns, high levels of K27 methylation were observed at FoxA2 CpG2 and the protein coding region (CDs) in hESCs as compared with CXCR4ϩ and islets cells. For the Pax1 gene (CpG island and CDs), we observed the expected high levels of K27 methylation in all cells tested (because this gene is silent in these cells).
Inhibition of DNMTs during Differentiation-To test the hypothesis that DNA methylation is involved in activation of FoxA2 gene transcription during differentiation, we treated cells with 5-aza-dC, a potent inhibitor of DNA methylation. We tested the effect of 5-aza-dC on expression of the pluripotency marker Oct4 and the definitive endoderm genes CXCR4, SOX17, and FoxA2. Oct4 expression was down-regulated by ϳ40% (29), whereas CXCR4 and SOX17 were up-regulated (5and 4,000-fold, respectively) following exposure to differentiation conditions for 3 days (18), in the absence or in the presence of 5-aza-dC (Fig. 4A), demonstrating that under these condi- tions 5-aza-dC does not block the differentiation process. FoxA2 expression, on the other hand was up-regulated by 600fold following differentiation, but this effect was significantly inhibited (3-fold) by treatment with 5-aza-dC (Fig. 4A). These results suggest that DNA methylation is necessary for efficient activation of FoxA2 gene expression in differentiating hESCs; this may involve methylation of CpG2 by DNMT3b.
To directly test the role of DNMT3b in activation of FoxA2 gene expression, we inhibited the expression of DNMT3b by siRNA knockdown. Following siRNA transfection, DNMT3b RNA levels were measured using quantitative real time PCR.
Transfection of undifferentiated hESCs with DNMT3b siRNA led to a 60% reduction in DNMT3b mRNA levels as compared with control siRNA (data not shown). hESCs were exposed to siRNA directed against DNMT3b, or control siRNA, for 2 days followed by 3 days of differentiation. Following this combined protocol of gene knockdown and differentiation, RNA was extracted, and gene expression levels were measured using real time PCR. Treatment of cells with either control siRNA or siRNA directed against DNMT3b did not affect global differentiation as seen by increased expression of the endoderm markers CXCR4 and SOX17 (Fig. 4B). As expected, following differ- entiation, levels of DNMT3b were reduced (to 17%, siCT); treatment with siDNMT3b led to a further reduction (to 7%) (Fig. 4B). Levels of FoxA2, on the other hand, were elevated as expected following differentiation but were significantly decreased (ϳ2-fold) following treatment with the siRNA directed against DNMT3b (Fig. 4B).

DISCUSSION
We have identified a CpG island located in the promoter region of the FoxA2 gene that appears to directly regulate expression of the gene according to its methylation status. Although DNA methylation typically correlates with gene silencing, in this case, we observed the opposite pattern, namely high levels of DNA methylation in expressing tissues (CXCR4ϩ, islets, liver, and colon) and low levels in nonexpressing tissues. Many tissue-specific genes and genes overexpressed in tumors show methylation of CpG islands downstream of the TSS (30), showing that methylation at these regions does not block transcription (31). However, most DNA methylation studies have focused on proximal promoter regions: consistently, singlegene studies have shown that promoter methylation correlates with reduced transcriptional efficiency.
Given the low extent of methylation of CpG islands in the human genome, we assume that at the onset of gastrulation, an as yet unknown mechanism distinguishes FoxA2 CpG2 for methylation in cells of the endoderm lineage, as opposed to the bulk of CpG islands in the genome. A key question arising from our study is what mechanisms lead to this selective methylation of CpG2. One possibility is binding of factors that direct methylation to this region. A plausible candidate for this is the Polycomb group complex (PRCs). Polycomb group proteins are considered as negative regulators of gene expression: they are recruited to the promoter region and act to repress transcription by recruitment of DNA methyltransferases that directly methylate the DNA and promote binding of chromatin remodeling proteins. Indeed, using ChIP assay, we observed that the CpG2 region is bound by the Polycomb group protein SUZ12 and the DNA methyltransferase DNMT3b preferentially in undifferentiated hESCs as compared with CXCR4ϩ or islets cells. PRC proteins may be recruited to the genomic location by sequence-specific factors that bind directly to the DNA. Consistent with this, it has been shown that de novo methylation patterns can be directed by specific cis-acting methylation-determining regions that precisely recapitulate methylation patterns that arise during differentiation (32); this study also showed that methylation-determining regions can bind transcription factors and protect the region from undergoing methylation. This scenario may apply in hESC or nonendodermal tissues, where the CpG is unmethylated, whereas in endodermal tissues, such factors may be absent or unable to bind, allowing the recruitment of PRCs and DNMTs that methylate the region.
Recently, a methodology was reported (33,34) that combines bisulfite conversion with ChIP and deep sequencing, thereby allowing a direct genome-wide interrogation of two epigenetic marks on the same DNA molecule. It was shown that depending on CpG density, H3K27me3 and DNA methylation either co-occur (low CpG density) or are mutually exclusive (high CpG density). Because CpG2 has high CpG density, it may be expected to show mutually exclusive H3K27me3 modification and DNA methylation. Indeed CpG2 shows high levels of H3K27me3 when the CpG island is unmethylated (hESC) and lower levels of H3K27me3 when the CpG island is methylated (CXCR4ϩ and islets). The low levels of H3K27me3 in FoxA2expressing tissues may help to explain how the FoxA2 gene is active, even though its promoter is methylated. Further analysis of the chromatin state around the CpG2 region is needed to understand how the DNA methylation does not repress FoxA2 gene expression.
Typically, DNA methylation of gene promoters is associated with transcriptional silencing (35). Only in rare cases does gene methylation correlate with expression. Generally such methylation is located at regions far upstream of the TSS, e.g. the IL-8 gene and the imprinted Igf2 gene (36). In a recent genome-wide methylation analysis (20), it was shown that tissue-specific methylation of CpG islands in intragenic regions is associated with gene activation. It was suggested that these sites may bind methylation-sensitive repressors that suppress distant promoters. Alternatively, these regions may contain promoters for antisense RNAs that are active when this region is unmethylated (37). This assumption was supported by the presence of active chromatin modifications (H3K4me3) that correlate with TSS sites (38).
Thus, direct evidence for transcriptionally active yet heavily methylated promoters is rare (39,40). In the FoxA2 gene promoter, CpG2 was shown to be methylated preferentially in expressing tissues. In addition, using reporter gene assay, we showed that the CpG island in its unmethylated form has a negative effect on transcription: this effect is abolished when the CpG island is methylated, consistent with the notion that methylation may block binding of repressing proteins. The presence of methylated DNA within the promoter region of a gene in expressing tissues raises the question of how methylation contributes to the activation of transcription. Two plausible mechanisms observed under situations of imprinting (41) are alternate promoter usage and presence of an insulator. In the alternate promoter model, methylation blocks access of the transcription machinery to the CpG island allowing the use of the basic core promoter of the gene, whereas in tissues where the CpG island is unmethylated, transcription initiates from the CpG island in preference to the basic core promoter. In the case of FoxA2, bioinformatics analysis did not identify promoter elements in the CpG2 region (data not shown). In the second mechanism, the methylated region contains a binding site for an insulator protein such as CTCF that blocks the effect of an enhancer located upstream of this region on a downstream gene. In the case of FoxA2, this possibility is more likely because in genome-wide chromatin state analysis done in human pancreatic islets (42), there is some evidence for such an upstream enhancer according to the presence of H3K4me around the CpG1 region. In addition, in a genome wide analysis of CTCF binding done in human hESCs, a binding site for CTCF was identified around the CpG2 region (43).
Based on these results, we hypothesize that in hESCs, a lack of DNA methylation permits binding of a sequence-specific repressor protein that mediates the binding to this region of SUZ12, a component of PRC2 that includes also the protein EED and EZH2. This protein directly methylates H3K27, a modification that also contributes to the repression of the gene in hESC. SUZ12 in turn recruits DNMT3b. We propose that upon differentiation, DNMT activation leads to CpG island methylation, which in turn causes loss of repressor protein binding. This is consistent with results showing that following dense DNA hypermethylation of promoters, EZH2 is no longer required to maintain DNA methylation and therefore is released (44). A combination of activating transcription factors that begin to be expressed in CXCR4ϩ cells, including transcription factors that activate the FoxA2 enhancer (22), together with alterations in chromatin modifications such as H3K27 trimethylation (repressive) and H3 acetylation or H3K4 dimethylation (activating) may then lead to activation of FoxA2 gene expression (Fig. 5). It is possible that pioneer factors play a role in this process, perhaps by recognizing the locus while still in its closed chromatin conformation or by interacting with the methylated CpG2. We have performed a bioinformatic analysis to detect potential transcription factor binding sites in CpG2 using MatInspector software. Among the factors identified are NEUROD, PTF1, PAX4/6, C/EBP, HNF6, and HNF1, which are implicated in regulation of pancreas and liver genes (45).
An additional important question to be resolved is whether DNMT3b is recruited by the Polycomb proteins but is inactive in hESCs or is recruited later. According to ChIP assay, the DNMT3b is already present on the CpG2 region in hESCs and may thus contribute to the low levels of methylation (20%) observed in these cells. The signal that activates DNMT to methylate CpG2 is unknown. Because Polycomb complexes contain several enzymatic activities, e.g. histone methylation, ubiquitination, and sumoylation, a post-translational modification that directly or indirectly alters DNMT activity and/or recruitment is a plausible mechanism.
To our knowledge, this study is the first to identify methylation-dependent gene activation as a mechanism for activating developmental regulators. Methylation at the promoter region of the FoxA2 gene both in early stages of development (e.g. CXCR4ϩ, definitive endoderm) and in several endoderm-derived tissues (liver, pancreas, and colon) set this case as a key developmental and lineage commitment mechanism in endoderm fate decision during development. More detailed molecular dissection of this process will permit identification of the participants and determine whether such mechanisms control activation of additional developmental regulators and cell lineages.