New 5' regions of the murine and human genes for DNA (cytosine-5)-methyltransferase.

DNA (cytosine-5)-methyltransferases (EC 2.1.1.37) maintain patterns of methylated cytosine residues in the mammalian genome; faithful maintenance of methylation patterns is required for normal development of mice, and aberrant methylation patterns are associated with certain human tumors and developmental abnormalities. The organization of coding sequences at the 5′-end of the murine and human DNA methyltransferase genes was investigated, and the DNA methyltransferase open reading frame was found to be longer than previously suspected. Expression of the complete open reading frame by in vitro transcription-translation and by transfection of expression constructs into COS7 cells resulted in the production of an active DNA methyltransferase of the same apparent mass as the endogenous protein, while translation from the second in-frame ATG codon produced a slightly smaller but fully active protein. Characterization of mRNA 5′ sequences and the intron-exon structure of the 5′ region of the murine and human genes indicated that a previously described promoter element (Rouleau, J., Tanigawa, G., and Szyf, M. (1992) J. Biol. Chem. 267, 7368-7377) actually lies in an intron that is more than 5 kilobases downstream of the transcription start sites.

The regulatory system that controls DNA methylation in mammals is poorly understood but must be complex. Genomic methylation patterns are reshaped during gametogenesis and early development and undergo programmed alterations during cellular differentiation (1). Methylation patterns are responsible for the repression of parasitic sequence elements (2)(3)(4) and the expression status of genes subject to genomic imprinting and X inactivation (5,6). Abnormalities of methylation patterns have been observed in several pathological states (7)(8)(9)(10)(11)(12). However, very little is known of the regulation of DNA methylation or of the control of DNA methyltransferase biosynthesis. Precise knowledge of the structure and function of the enzyme and its gene will be essential for a full understanding of the mechanisms that control methylation patterns.
Human DNA methyltransferase is encoded by DNMT, which has been mapped to 19p13.2 (13); murine DNA methyltransferase is encoded by Dnmt on proximal chromosome 9 (14). Eukaryotic DNA methyltransferases are complex proteins that appear to have arisen via gene fusion events that yielded a large (1,000 amino acid) N-terminal regulatory domain fused to a 3Ј domain that is very closely related to prokaryotic cytosine-5 methyltransferases (reviewed in Ref. 15). Past studies of both the murine and human genes, including overexpression of presumed full-length cDNAs, have resulted in the production of a protein with properties similar to those associated with the endogenous gene product (16 -18). These include preferential methylation of hemimethylated DNA, nuclear localization, and recruitment to replication foci during S phase (17). Studies of the murine gene have identified a genomic region of 2 kb 1 5Ј to the first in-frame ATG codon in the existing sequence, which was described as possessing promoter activity and which was said to be associated with transcription start sites (19). In addition, this promoter activity was described as responsive to signals transmitted via the Ras signal transduction pathway (20). This observation could be most important since dysregulated expression of both the murine and human genes has been associated with the neoplastic state (21)(22)(23).
Despite the above data, several findings suggest that both the murine and human DNA methyltransferases could have additional 5Ј-coding sequences and transcription start sites different than those previously defined for the murine gene. First, the existing human cDNA contains a reading frame free of stop codons, which extends at least 120 codons upstream of the first in-frame ATG codon (13). Second, there is a region within the published genomic sequence from the murine gene (19) that is upstream of the previously defined transcription start site and which has extensive homology to part of the human DNA methyltransferase cDNA (13). Third, recent sequence analysis of the chicken (24) and sea urchin DNA methyltransferase cDNAs (Aniello, F., Locascio, A., Fucci, L., Geraci, G., and Branno, M. (1996) GenBank accession no. Z50183) has revealed close conservation of sequence with the murine and human proteins except for additional 5Ј extensions not present in the existing human and murine cDNAs.
To resolve the above issues, we have further analyzed 5Ј sequences in mRNA and both genomic and cDNA material from the murine and human DNA methyltransferases. Extensive analysis by a variety of methods indicates that the murine and human mRNAs and proteins are longer and that the transcrip-* This work was supported by National Institutes of Health Grants GM00616 and CA60610 (to T. H. B.) and CA43318 (to S. B. B.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) X14805, X63692.
§ The first two authors made equal contributions to this study, and the order of their names is arbitrary.
ʈ To whom all correspondence should be addressed: tion and translation start sites lie further 5Ј than previously reported (13,14). No transcription initiation in the vicinity of a previously described promoter element (19) was observed. The contribution of these newly described 5Ј regions to enzyme function are explored, and the potential consequences for regulation of gene expression are discussed.

EXPERIMENTAL PROCEDURES
Identification of Murine DNA Methyltransferase 5Ј-cDNA Clones by Screening of cDNA Libraries-An 8.5-day-old mouse embryo cDNA library in gt10 was provided by Dr. Brigid Hogan (Vanderbilt University), and 7.5-day-old mouse embryo cDNA library in ZAP was provided by Dr. John Gearhart (Johns Hopkins University). The adult mouse testis library ( gt10 5Ј-stretch) was purchased from Clontech. Filters were probed with 5Ј probes as described (26), and phage cDNA inserts were subcloned into plasmid pCRII (Invitrogen) or pBluescript SKϩ (Stratagene) and sequenced by standard methods (26). The multiple sequence alignment of Fig. 5 was performed by Clustal W (27).
5Ј-RACE Analysis of Murine DNA Methyltransferase mRNA-Purified total RNA from cultured murine erythroleukemia cells was subjected to RACE as described by Frohman (28). The forward primer was RNase H/3 ( Fig. 1 and Table I). PCR products were isolated from 1% agarose gels, cloned into pBluescript SK M13ϩ, and sequenced (26).
RNase , and cross-linked to the membrane by UV irradiation (26). Hybridization, stripping, and reprobing were performed as in Shackleford and Varmus (30).
Identification of 5Ј-Human DNA Methyltransferase Sequences through Screening of Cosmid DNA-HindIII restriction digests of a chromosome 19 cosmid, which had been shown to encode most of the human DNA methyltransferase gene (13), were probed with the most 5Ј 118 base pairs of the previously reported cDNA of the human gene (13) to identify 5Ј-genomic regions. A 2.0-kb fragment from cosmid R26299 (13) was identified and cloned into plasmid pBluescript SK-(Stratagene) and sequenced (26).

Characterization of DNA Methyltransferase Transcripts in Human
Cells-DNA methyltransferase transcripts were characterized by a reverse transcriptase (RT)-PCR procedure in which 0.5 g of total RNA was initially reverse transcribed with 20 units of RNase H Ϫ reverse transcriptase (Superscript; Life Technologies, Inc.) and then amplified with 0.5 units of Taq polymerase (Boehringer Mannheim) employing the oligonucleotide primers described in Table I and Figs. 1 and 2. Transcripts were also detected in 2.0 g of poly(A) selected RNA by Northern analysis, probed with probe A (Fig. 1), and a probe representing the 3Ј-most 2.5 kb of the human DNA methyltransferase cDNA (Far 3Ј probe) as described previously (13).
Expression of Human DNA Methyltransferase Gene cDNAs in COS7 Cells and in an in Vitro Transcription-Translation System-For expression in COS7 cells, two human DNA methyltransferase sequences were cloned into expression vector pSVK3 (Pharmacia), which expresses these sequences from the SV40 early promoter. The longer sequence, containing the most 5Ј sequences, was prepared by excising an XbaI-SmaI fragment from the HindIII cosmid fragment (described above) and ligating these sequences into pBluescript containing the previously reported complete cDNA for human DNA methyltransferase (13). A 5.2-kb fragment was excised with FspI and ScaI and blunt-end ligated into the expression vector to create the construct HMT5.2. The shorter, previously reported cDNA (13) was subcloned into the same vector to produce construct HMT4.7. 6 g of each construct was transfected, using 36 l of lipofectamine (Life Technologies, Inc.), as per the manufacturer's protocol, into 2 to 3 ϫ 10 6 COS7 cells, which were harvested 48 h later. DNA methyltransferase protein was detected in whole cell lysates by a previously described Western blot procedure using an antibody (JH1147) specific for the human protein (18). DNA methyltransferase activity in the cell sonicates was measured exactly as previously reported (18,22).
The creation of the fibroblast cell line stably expressing the 4.7-kb human DNA methyltransferase cDNA, 90/SV, is described elsewhere (18).
For in vitro transcription-translation expression in the TNT (Promega) system, the same constructs described above were subjected to transcription and translation in the presence of [ 35 S]methionine, and the protein products were visualized by fluorography after electrophoresis on SDS-polyacrylamide gel electrophoresis gels.  (15). B, intron-exon structure of the 5Ј region of the murine Dnmt gene. Exons are depicted as boxes and introns as single lines; lengths are given in nucleotides. 5Ј-ends were mapped by RACE cloning, RNase H mapping, and sequencing of ends from cDNA clones. The 5Ј-end of the mRNA had been provisionally assigned to the site of the EcoRI site in the third exon. The position of the previously described promoter (19) is shown by a hatched box. Oligonucleotides used in RNase H mapping and 5Ј-RACE analysis are shown as filled arrowheads, which point toward the 3Ј-end of the oligonucleotide. Regions whose sequence has not been determined are given as dashed lines. C, intron-exon structure of the 5Ј region of the human DNMT gene. The 37-nucleotide exon appears within the 5Ј-flanking sequence as published by Rouleau et al. (19). Symbols are as in B. The sequences and positions of the oligonucleotides used in RT-PCR analysis and RACE analysis are shown in B and C and are given in Table I  not present in the original murine cDNA clones (14) suggested that these latter clones were truncated at their 5Ј-ends. 5Ј-RACE of murine RNA yielded an additional 230 nucleotides of cDNA upstream of the EcoRI site that had been provisionally identified as the 5Ј-end of the murine Dnmt cDNA (14). This new sequence contained a single in-frame ATG codon (Figs. 1 and 2). The 5Ј region of the human DNMT gene was found to have a structure consistent with that of the murine cDNA. A 5Ј probe (nucleotides 1-118) of the published cDNA (13) was used to screen HindIII restriction fragments of a chromosome 19 cosmid clone (R26299), which had been shown to encompass the DNMT gene. Cloning and sequence analysis of a 2.0-kb fragment identified an exon corresponding to the 5Ј-most 158 nucleotides in the murine RACE clone (Figs. 1 and 2). The in-frame ATG codon was present at corresponding locations in both the murine and human sequences ( Figs. 1 and 2). The DNA methyltransferase cDNAs have been renumbered to accommodate the additional sequence. Extensive screening of murine cDNA libraries with cDNA probes just 3Ј of the 5Ј-most EcoRI site identified 11 clones whose 5Ј-ends were distributed between nucleotides 14 -128 of the 5Ј-RACE product (Fig. 1). Seven of these cDNA clones were obtained from libraries prepared from 7.5-day-old mouse embryos, three were from 8.5day-old embryos, and one was from an adult mouse testis cDNA library. With the exception of two cDNA clones from 7.5-day-old mouse embryo cDNA, which had 5Ј-ends well downstream of the putative promoter ( Fig. 1), all cDNA clones and 5Ј-RACE products terminated well upstream of the previously described promoter, and all contained a 37-nucleotide exon that is present in the published sequence and was assumed to be part of the 5Ј-flanking region of the murine gene (19). This exon is flanked by consensus splice donor and acceptor sites and lies between nucleotides 1185 and 1221; the predominant start site had been identified as nucleotide 2004 (19).

Identification of New
RNase H mapping was used to obtain a reverse transcriptase-independent estimate of the extent of 5Ј sequence in the murine cDNA (29). Oligonucleotides complementary to regions of the mRNA (Table I and Figs. 1 and 2) were hybridized to total RNA from murine erythroleukemia cells and digested with RNase H. The products were subjected to denaturing electrophoresis in agarose gels and analyzed by RNA blot hybridization. As shown in Fig. 3, the lengths of the cleavage products place the 5Ј-end of the mRNA within ϳ80 nucleotides of the end of the 5Ј-RACE product; there was no evidence of 5Ј-ends in the vicinity of the previously described promoter element. Probes from near the 5Ј-end (nucleotides 39 -111) and from a more 3Ј region (nucleotides 292-557) hybridized to the same bands, indicating a homogeneous population of mRNA 5Ј sequences (Fig. 3). These data indicate that there is a single transcription start site, or a tight cluster of such sites, no more  (19). The bold sequence at nucleotide 80 (Ggtaggt) is a consensus splicing donor signal. The bold sequence at nucleotides 160 -196 of the murine sequence was previously reported to be in the 5Ј-flanking sequence of the DNA methyltransferase gene (19); note that nucleotides 81-117 (bold) of the human sequence are highly homologous to this sequence. The amino acid sequence encoded by this exon is highly conserved from sea urchin to human (Fig. 5). than 80 nucleotides from the 5Ј-end of the RACE clone shown in Fig. 1. The results of RACE analysis, RNase H mapping, and analysis of 5Ј clones from cDNA libraries were in good agreement.
Relationship of the New 5Ј Sequences to the DNA Methyltransferase Gene Transcription Product-The new 5Ј sequences were found to be present in all of the mRNA produced from the murine and human DNMT genes. For the analysis of the human mRNA, RT-PCR analyses of mRNA from different cell lines, using primers (oligonucleotides 74, 75, and 209; Table I and Figs. 1 and 2) in the new upstream sequence 5Ј to the new in-frame ATG site paired with downstream primers (oligonucleotides 292A and 27A) in the known coding region of the gene (Table I and Figs. 1 and 2), revealed distinct PCR products of the predicted sizes (Fig. 4A). Furthermore, Northern blot analyses with a 220-base pair probe encompassing these same upstream sequences revealed strong hybridization to the same 5.2-kb transcript previously defined with probes from the formerly characterized human cDNA (Fig. 4B and Ref. 13).
The RNase H mapping data of Fig. 3 confirm that the newly identified sequences are included within all or nearly all of the Dnmt mRNA population in murine erythroleukemia cells, and no evidence of alternative transcriptional start sites or alternative splicing pathways was observed.
The above data confirm the existence of an upstream exon (which contains an in-frame ATG codon) separated from the next downstream exon by an intron that is at least 3 kb in length in the human gene and at least 1.2 kb in the murine gene ( Figs. 1 and 2). The exon sequences and intron-exon structures of the 5Ј-ends of the mouse Dnmt and human DNMT genes are well conserved (Figs. 1 and 2). A comparison of the new N-terminal amino acid sequences of the human and murine DNA methyltransferases with those of the chicken and sea urchin cDNAs is shown in Fig. 5. There is extensive similarity among all four DNA methyltransferases when aligned with respect to their most upstream candidate translation start sites. The second ATG codon is not conserved in the chicken and sea urchin sequences (Fig. 5).
The 5Ј-Most ATG Codon Directs Synthesis of a DNA Methyltransferase Protein of the Same Size as the Endogenous Protein-To evaluate the contribution of the new 5Ј-transcript regions to methyltransferase proteins, we compared the trans-  (Table I and Table I and Figs. 1 and 2 for primer locations) in the ethidium bromide-stained agarose gels. B, Northern blot analysis. Fulllength 5.2-kb DNA methyltransferase transcripts are detected by Northern blot of poly(A) ϩ RNA from NCI-H249 cells hybridized with the 220-base pair 5Ј region probe (Probe A) shown in Fig. 1 and with a cDNA probe that covered the terminal 2.5 kb of the DNA methyltransferase cDNA clone (Far 3Ј probe). lation products of the extended human sequences with the previously characterized truncated human cDNA. In vitro transcription and translation of the cDNA containing the new upstream human sequence (HMT5.2) resulted in a product that was approximately 10,000 daltons larger in mass than that produced from the previous, truncated DNA methyltransferase cDNA (HMT4.7) (Fig. 6C). The apparent preference for translation from the new upstream ATG was also observed in transfected COS7 cells. Transient expression of the longer and shorter human cDNAs from a SV40 promoter in COS7 cells followed by Western blot analysis using a human DNA methyltransferase-specific antibody showed an increase in expression of a M r 190,000 product from the former and a M r 180,000 product from the latter (Fig. 6A). The M r 190,000 product from cells transfected with the construct containing the new DNA methyltransferase comigrated with the endogenous DNA methyltransferase protein present in untransfected COS7 cells. These data indicate that both the endogenous and exogenously expressed DNA methyltransferase initiate from the upstream ATG site identified in the new 5Ј sequences.
Consequences of the New N-terminal DNA Methyltransferase Regions for Enzyme Activity-The longer and shorter DNA methyltransferase proteins discussed above were expressed in COS7 cells, and the relative preference for unmethylated and hemimethylated substrates was determined. A 2-3-fold increase in maintenance methylating activity was seen in whole cell lysate from cells transfected with either the longer or shorter human DNAs (Table II). This increased activity was consistent with the 2-4-fold increase in DNA methyltransferase protein detected by Western analysis (Fig. 6A). Importantly, both constructs generated proteins with the same preference for hemimethylated DNA since very low de novo methylating activity was noted for both the long and short proteins relative to the distinct increases in maintenance methylation (Table II). Thus, the newly defined upstream DNA methyltransferase regions do not appear, by the assays utilized, to be required for enzyme activity nor to impart novel regulation of enzyme activity as compared to the previously described shorter proteins. These results are in agreement with previous proteolysis experiments (31) and studies of N-terminal truncations expressed in cultured cells (17). DISCUSSION The data of this study clarify the structure of the murine and human DNA methyltransferases and identify a common sequence at the N terminus of all metazoan DNA methyltransferases characterized to date. The N-terminal sequences are conserved from sea urchin to human, and while the function of the new sequences is not known, secondary structure calculations by the PredictProtein program (32) predicts a long helixturn-helix structure that is rich in leucine residues. This conserved N-terminal region may have a conserved function in all four DNA methyltransferases, although there is at present no indication as to the nature of the function. Proteolytic removal of sequences at the N terminus has been shown to have no effect on enzyme activity (31), and removal of the N-terminal 200 amino acids does not affect nuclear localization or ability to target to replication foci during S phase (17).
The results presented here must be reconciled with previous descriptions of a DNA methyltransferase promoter, which has now been shown to lie in an intron. No mRNA 5Ј-ends were found to map to the vicinity of this putative promoter. There are at least two more exons contained in the major transcript of the murine and human genes that lie upstream of the previ- ously reported transcription start site and promoter region of the murine gene (19). It is now apparent that this previously defined promoter lies in an intron that is located at least 5 kb downstream of exonic sequences present in both the human and murine DNA methyltransferase mRNAs. RNase H mapping indicated that about 80 additional nucleotides lie 5Ј of the sequences identified in the present study. These last 80 nucleotides are encoded by one or more upstream exons but have proven refractory to cloning and are likely to lie in a CpG island; such sequences are often very difficult to clone as the high G ϩ C content of these islands impedes the progress of reverse transcriptase. However, all the available data suggest that the promoter sequence characterized by Rouleau et al. (19) lies in an intron well downstream of the large majority of mRNA 5Ј-ends.
While there appears to be a common transcription start site in all tissues, there is evidence of an alternative translation start site in mouse oocytes, in which DNA methyltransferase protein (which is present in very large amounts) comigrates with the product of in vitro transcription-translation of a truncated DNA methyltransferase mRNA that lacks the first ATG codon (33). All other cell lines and tissue types have been found to contain a single species of M r 190,000, which corresponds to translation from the first ATG. However, the second ATG codon is in a context that is a closer fit to the consensus defined by Kozak (34). As mentioned earlier, sequences at the 5Ј-end are not known to participate in any of the activities of DNA methyltransferase, but it is possible that usage of alternative translation start sites does affect the biological activities of the translation products. It is notable in this regard that the shorter form of the enzyme is associated with the oocyte cortex rather than the nucleus (33).
In summary, the original murine and human DNA methyltransferase cDNAs were found to be truncated at their 5Ј-ends; most of the missing sequences have now been cloned and sequenced. These sequences are conserved among the four known metazoan DNA methyltransferases but are of unknown function. The finding that they are encoded by exons well upstream of what has been proposed to be the promoter for the DNA methyltransferase gene (19,20,25), and that this putative promoter had previously been termed unusual because of a lack of sequence motifs characteristic of mammalian promoters (19) make it highly unlikely that this element actually determines the transcription start site. Accurate characterization of the sequence elements and environmental factors that control the production of DNA methyltransferase will be essential for an understanding of the regulation of this enzyme in normal and pathological states.