Identification of a conserved α-helical domain at the N terminus of human DNA methyltransferase 1

In vertebrates, DNA methyltransferase 1 (DNMT1) contributes to preserving DNA methylation patterns, ensuring the stability and heritability of epigenetic marks important for gene expression regulation and the maintenance of cellular identity. Previous structural studies have elucidated the catalytic mechanism of DNMT1 and its specific recognition of hemimethylated DNA. Here, using solution nuclear magnetic resonance spectroscopy and small-angle X-ray scattering, we demonstrate that the N-terminal region of human DNMT1, while flexible, encompasses a conserved globular domain with a novel α-helical bundle-like fold. This work expands our understanding of the structure and dynamics of DNMT1 and provides a structural framework for future functional studies in relation with this new domain.

In vertebrates, DNA methyltransferase 1 (DNMT1) contributes to preserving DNA methylation patterns, ensuring the stability and heritability of epigenetic marks important for gene expression regulation and the maintenance of cellular identity.Previous structural studies have elucidated the catalytic mechanism of DNMT1 and its specific recognition of hemimethylated DNA.Here, using solution nuclear magnetic resonance spectroscopy and small-angle X-ray scattering, we demonstrate that the N-terminal region of human DNMT1, while flexible, encompasses a conserved globular domain with a novel α-helical bundle-like fold.This work expands our understanding of the structure and dynamics of DNMT1 and provides a structural framework for future functional studies in relation with this new domain.
DNA methylation is a major epigenetic modification that regulates chromatin structure and various biological processes in mammals (1)(2)(3)(4).DNA methylation is carried out by four members of the DNA methyltransferase (DNMT) protein family, the best characterized of which is DNMT1.DNMT1 is a 1616-amino acid protein known to encompass a replication foci-targeting sequence (RFTS) domain, two bromo-adjacenthomology domains, and a C-terminal methyltransferase domain (Fig. 1A).While absent in lower species, DNMT1 is highly conserved in vertebrates, from Xenopus laevis to human.
DNA methylation by DNMTs predominantly targets palindromic CpG sites, showing a strong tendency to preferentially methylate CpG sites in a hemimethylated state, although asymmetric methylation at non-CpG sites has also been observed (5).Recent studies have revealed that the establishment and maintenance of DNA methylation involves all DNMTs to varying degrees, in conjunction with DNA demethylases, maintaining a dynamic equilibrium between methylation gain and loss (6).Consequently, knowledge of the DNMT structures is essential for elucidating the specific role played by each member in DNA methylation maintenance.
In the case of DNMT1, many structures containing the RFTS, bromo-adjacent-homology, and catalytic domains have been determined, shedding light on the mechanisms of methylation (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17).These studies have deepened our understanding of the modes of action of DNMT1, particularly in relation to pathologic DNMT1 variants implicated in degenerative disorders of the nervous system (18)(19)(20)(21).All structural studies so far have exclusively focused on the segment from residue 350 to the C terminus of DNMT1.The N-terminal region of DNMT1 has received scant attention and has been described as disordered (12), even though limited resistance to proteolysis suggested that it might encompass folded segments (22).Here, using nuclear magnetic resonance (NMR) spectroscopy and small-angle X-ray scattering (SAXS), we identify a hitherto unreported folded domain within the N-terminal region of DNMT1.

Identification of a folded domain at the N terminus of DNMT1
We initiated our studies using a recombinant DNMT1 fragment encompassing residues 16 to 134, selected based on predicted secondary structure elements (data not shown), and identified as DNMT1 NL .The 1 H- 15 N heteronuclear singlequantum coherence (HSQC) spectrum of DNMT1 NL showed well dispersed signals overall, with some variations in signal intensities, indicating that there were structured as well as disordered regions within the protein (Fig. 1B).Further inspection of the 1 H- 15 N relaxation data collected on DNMT1 NL revealed that terminal segments comprising residues 16 to 21 and 94 to 134 were intrinsically disordered, with elevated R 1 and decreased R 2 15 N relaxation rates and decreased steadystate 15 N-{ 1 H} heteronuclear overhauser effects (NOEs), compared to the rest of the protein (Fig. 1C).The rotational correlation time (τ c ) estimated from the average R 1 and R 2 values for DNMT1 NL was 8.1 ± 2.2 ns, indicating that DNMT1 NL is monomeric in solution.By truncating the residues in the C-terminal unstructured region, we produced a shorter version of DNMT1 (residues 16-93), denoted as DNMT1 N (Fig. 1C).DNMT1 N is also a monomer in solution based on its τ c value of 5.4 ± 0.6 ns.Compared to the 1 H- 15 N HSQC of DNMT1 NL , the spectrum of DNMT1 N showed better separation of signals and more homogeneous signal intensities (Fig. 1, B and D).We therefore used DNMT1 N for subsequent structural studies.
The differences in C α , C β , N, and H N chemical shift values between DNMT1 N and DNMT1 NL were mostly negligible, except for the C-terminus of DNMT1 N near Glu93, as expected, and regions near Ser35 and Leu46-Gln54, where small chemical shift differences were observed in the overlaid 1 H- 15 N HSQC spectra of DNMT1 N and DNMT1 NL (Fig. 1B).Interestingly, these regions harbor negatively charged residues.The detectable chemical shift perturbations might result from weak transient electrostatic interactions with the extended disordered region of DNMT1 NL (aa 94-134).

A B C
N-terminus  The RDC restrains were instrumental in refining and validating the relative orientations of the different regions of DNMT1 N .Residues 55 to 90, which cover α-helices 3 and 4 in the structure, have uniformly negative RDC values except for the residues connecting these α-helices (Fig. 3, A and B).The RDC values for residues 16 to 54, corresponding to α-helices 1 and 2, are less uniform, indicating that α-helices 1 and 2 have different orientations from those of α-helices 3 and 4.There is excellent agreement between the experimentally measured and back-calculated RDCs (Fig. 3C).

SAXS analysis of DNMT1 N-terminal domain
We used SAXS (24,25) to examine the global fold and oligomerization state of DNMT1 N and thereby further evaluate the NMR-derived structure (Fig. 3D and Table S1).The SAXS Guinier plot (26) of DNMT1 N was characteristic of a homogeneous sample (Fig. 3E).Furthermore, the Kratky and Porod-Debye plots (27) showed that the protein was globular with limited flexibility in the N and C termini (Fig. 3F).We derived a radius of gyration of 15.4 Å and a maximum dimension D max of 46.7 Å for DNMT1 N , consistent with a monomeric state (Fig. 3G).The overall shape of DNMT1 N was calculated by ab initio model reconstruction using GASBOR and DAMMIF from the ATSAS SAXS data analysis software package (28).GASBOR reconstructs the protein structure by a chain-like ensemble of dummy residues while DAMMIF does the reconstruction through assembly of densely packed spheres.Superposition of the NMR structure with the envelopes generated from DAMMIF and GASBOR showed good fit to both envelopes (Fig. 3H).Further evaluation using FoXS (29) demonstrated high consistency of the SAXS data with the DNMT1 N NMR ensemble (Fig. 3I).The SAXS and NMR approaches indicate that DNMT1 N is a monomer in solution.In I(q) q 2 0 1 2 3 4 5 6 7 ×10 - 4   Guinier plot

DNMT1 N adopts a novel fold
A search using the DALI server against the Protein Data Bank and AlphaFold-predicted human proteome (30-32) identified an N-terminal motif in human DNMT1 that closely matches our DNMT1 N NMR structure (Fig. 4A).The root mean square deviation (r.m.s.d.) between the lowest energy NMR structure and the predicted model was 1.56 Å for the backbone C α , C, and N atoms of residues Glu22 to Leu90 (Fig. 4B).No other predicted protein structures exhibited a similar arrangement of four α-helices.Therefore, we conclude that DNMT1 N adopts a novel fold.No other secondary structure elements were predicted beyond DNMT1 N and before the RFTS domain.RoseTTAFold (33) also produced a structure comparable to that of DNMT1 N (r.m.s.d.= 2.31 Å for the backbone C α , C, and N atoms of residues Glu22 to Leu90), but with a 33-residue Cterminal helical extension to the fourth α-helix, not present in the experimental structure (Fig. 4C).

Discussion
We discovered a new folded domain of unknown function at the N terminus of DNMT1 (DNMT1 N ).Due to the chromatin association properties of DNMT1 (34), we investigated the binding of DNMT1 NL to the nucleosome core particle, but no interaction was detected (data not shown).However, there are clues that DNMT1 N has regulatory roles.Notably, it has been shown that through alternative RNA splicing of a sex-specific exon, DNMT1 from mammalian oocytes lacks a segment that matches DNMT1 N and is sequestered in the cytoplasm (35).Moreover, there is evidence that DNMT1 N interacts with the E-cadherin transcriptional repressor SNAIL1, with speculation that DNMT1 promotes gene expression by impeding the interaction of SNAIL1 with the Ecadherin promoter (36,37).It has also been reported that DNMT1 N interacts with DMAP1, a protein that preferentially activates DNMT1-mediated DNA methylation at sites of homologous recombination repair in response to DNA double-strand breaks (38,39).In addition, deletion of DNMT1 N in breast cancer cell lines was shown to diminish the histone deacetylase inhibitor LBH589-induced ubiquitylation-dependent degradation of DNMT1 and resulted in genomic hypermethylation (40).Consistently, an isoform of DNMT1 that lacks the N-terminal domain exhibited higher stability than full-length DNMT1 in vivo (41,42).The underlying mechanism is unclear but likely involves crosstalks among several different post-translational modifications on DNMT1, such as methylation and acetylation.Intriguingly, Lys70 in DNMT1 N was found to be methylated by protein methyltransferase G9a (43,44).Whether or not this modification contributes to the regulation of DNMT1 level in cells has not been investigated.
In conclusion, because the structure of DNMT1 N represents a novel fold, it cannot be used to suggest a possible function.However, based on what has been published so far, we can speculate that this helical domain is a protein-interaction module.The structure of DNMT1 N will be helpful for the rational design of single-point mutations aimed at deciphering the function of this domain using cell biology approaches.

Protein expression and purification
The N-terminal domain of human DNMT1 (residues 16-134), denoted as DNMT1 NL , was cloned with a tobacco etch virus protease cleavable N-terminal His 6 -tag in a pET15bderived expression system.A shorter version (residues 16-93), denoted as DNMT1 N , was made by inserting a stop codon (TAA) after Glu93.All proteins were produced in BL21(DE3) E. coli cells grown in M9 media prepared with 15 N-labeled NH 4 Cl and unlabeled or 13 C-enriched glucose.The cells were initially grown at 37 C to an A 600 of 0.5, then at 15 C to an A 600 of 0.6 before being induced with 1 mM isopropyl β-D-1thiogalactopyranoside for 16 h.The harvested cells were lysed using an EmulsiFlex C5 homogenizer (Avestin).The proteins were initially purified by Ni 2+ -nitrilotriacetic acid agarose chelation chromatography (QIAGEN) using buffers of 50 mM sodium phosphate, pH 7.5, 300 mM NaCl with 5-, 20-and 200-mM imidazole for the binding, washing, and elution steps, respectively.The His 6 -tags were cleaved by overnight incubation with tobacco etch virus protease at 4 C.The proteins were further purified by size-exclusion chromatography using a HiLoad 16/60 Superdex 75 column (Cytiva) and a running buffer of 50 mM sodium phosphate, pH 7.5, 300 mM NaCl.
The 15 N NMR relaxation studies were carried out on both 15 N-labeled DNMT1 N and DNMT1 NL .Longitudinal (R 1 ) and transverse (R 2 ) relaxation rates for backbone 1 H-15 N and 15 N-{ 1 H} steady-state NOEs were measured on these samples and analyzed using established methods (49)(50)(51).Ten relaxation delays (100, 300, 500, 600, 800, 1000, 1200, 1500, 1600, and 2000 ms) were used for R 1 , while 11 (4,8,16,20,28,32,40,60,80, 100, and 200 ms) were used for R 2 .The 15 N-{1H} NOE ratios were obtained from a reference experiment without proton irradiation and a steady-state experiment with proton irradiation for 3 s.The standard deviations of 15 N-{ 1 H} NOEs were calculated based on the measured background noise levels, as previously reported (49), using Equation 1: where I sat and I unsat are the measured intensities of the resonances in the presence and absence of proton saturation, respectively.σ Isat and σ Iunsat are the standard deviations of the noise in the spectra.
For structure determination of DNMT1 N , distance restraints were obtained from the analysis of 3D In total, 937 NOE-based distance restraints were used and categorized into seven bins with upper limits of 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, and 6.0 Å.Also included in the structure calculations were 132 backbone dihedral angle u and ψ restraints derived from the analysis of H α , H N , 13 C α , 13 C β , 13 C, and 15 N chemical shifts using TALOS+ (52) and 47 1 H-15 N RDC restraints out of 60 that were measured.The RDCs were measured in a 5% pentaethylene glycol monododecyl ether (C12E5)/95% n-hexanol mixture using 2D 1 H- 15 N IPAP HSQC experiments (53)(54)(55).The RDC alignment tensor magnitude Da and rhombicity used in the structure calculations were −11.00 and 0.61, respectively.The structures were calculated and refined using XPLOR-NIH by employing a simulated annealing protocol for torsion angle dynamics (56,57).A total of 200 structures were initially calculated, from which 30 structures with the lowest energies were used for further refinement.
A total of 60 measured RDCs, 47 of which were used for structure calculations, were compared to RDCs backcalculated from the 30 NMR structures using PALES (58), giving a Pearson's linear correlation coefficient of 0.97.The quality factor Q factor (59), which evaluates the agreement between the RDCs back-calculated from the structures and the observed RDCs, was used as a figure of merit for the goodness of fit of the calculated structures to the experimental In Equation 2, Q is the quality factor; RMS stands for root mean square; D calc and D obs are the back-calculated and measured residual dipolar couplings, respectively.

Small-angle X-ray scattering
The SAXS data were collected at the SIBYLS beamline 12.3.1,Advanced Light Source, Lawrence Berkeley National Laboratory, on several DNMT1 N samples (Table S1).For each sample, the scattering intensities were measured at three different protein concentrations (1, 2, and 3 mg/ml), demonstrating the absence of concentration dependence.Three different exposure times of 0.5, 1.0, and 5.0 s were used for each sample, and data were monitored for radiation damagedependent aggregation.Scattering data were plotted as a function of q = 4π[sin(θ/2)]/λ, where θ is the scattering angle and λ is the X-ray wavelength, subtracting for each curve the scattering data collected for just the buffer alone.The curves were rescaled for the solute concentrations and extrapolated to infinite dilution.All data analyses were performed using PRI-MUS, version 3.0, from ATSAS 2.4.2 (28).GNOM was used to generate the pair distance distribution function (P(r)) from which the maximum particle dimension (D max ) was estimated.The radius of gyration (R g ) was estimated using the Guinier plot (61).Divergent low-q data points exhibiting artifacts from beam-stopper scattering and data points of q >0.25 Å −1 were not included in Guinier and P(r) analysis.The output of GNOM was used as input for DAMMIF to calculate the overall shape of DNMT1 N .Twenty independent runs were conducted, and the generated models were averaged using DAMAVER to build a consensus molecular envelope.An ab initio envelope was also created using GASBOR as a comparison.SUPCOMB was used to superimpose the ab initio envelopes and NMR structures.The various software, including PRIMUS, GNOM, DAMMIF, DAMAVER, GAS-BOR, and SUPCOMB, were all from the ATSAS 2.4.2 program package (28).

Data availability
Coordinates for the NMR ensemble have been deposited in the Protein Data Bank with accession number 8V9U.NMR chemical shift assignments have been deposited in the Biological Magnetic Resonance Data Bank with accession number 31134.
Supporting information-This article contains supporting information.

Figure 1 .
Figure1.Identification of a folded segment at the N terminus of DNMT1.A, domain structure of human DNMT1 (hDNMT1).B, overlay of the 1 H-15 N HSQC spectra of DNMT1 NL (aa 16-134, red) and DNMT1 N (aa 16-93, cyan).C, R 1 , R 2 and 15 N-{ 1 H} NOE for DNMT1 N (cyan) and DNMT1 NL (red) are plotted against their residues, with the corresponding secondary structure elements in DNMT1 N shown.The R 1 and R 2 values were calculated using SPARKY 3.115 with errors determined via relaxation curve fitting.For 15 N-{ 1 H} NOEs, shown are the average values ± standard deviation calculated as explained in the Experimental procedures.D, 1 H-15 N resonance assignment for DNMT1 N where side chain signals for asparagine and glutamine residues are indicated by horizontal lines.BAH, bromo-adjacent-homology; DNMT1, DNA methyltransferase 1; HSQC, heteronuclear single-quantum coherence; NOE, nuclear overhauser effect; RFTS, replication foci-targeting sequence.

Figure 2 .
Figure 2. Three-dimensional structure of DNMT1 N .A, ensemble of the final 30 lowest-energy NMR structures of DNMT1 N .B, DNMT1 N structure in cartoon representation with residues at the interfaces of α-helices 2, 3, and 4 labeled and shown in stick representation.C, DNMT1 N structure in cartoon representation with residues at the interface of α-helices 1 and 2 labeled and shown in stick representation.D, electrostatic surface potential of DNMT1 N calculated using APBS in PyMOL.DNMT1, DNA methyltransferase 1; NMR, nuclear magnetic resonance.

Figure 3 .
Figure 3. Validation of the DNMT1 N structure.A, representative 2D IPAP 1 H-15 N-HSQC spectra for DNMT1 N from which RDCs were measured.Left, spectrum of isotropic DNMT1 N sample.Right, spectrum of DNMT1 N aligned in 5% C12E5/95% n-hexanol.B, experimentally measured backbone 1 H-15 N RDCs for DNMT1 N .Secondary structure elements are shown on the top.C, comparison between experimental and back-calculated 1 H-15 N RDCs.For each backcalculated RDC, shown is the mean value ± standard deviation calculated from the 30 NMR structures.D, SAXS scattering data of DNMT1 N .E, Guinier plot at low angles (q*R g <1.3)where R g is the radius of gyration.F, Kratky plot (left) and Porod-Debye plot (right) showing a linear plateau (red line) agreeing with a globular protein with limited flexibility.G, pair distance distribution function showing a D max of 46.7 Å. H, left, DNMT1 N NMR structure rigid-body-docked to the ab initio molecular envelope of DNMT1 N generated using DAMMIF.Right, DNMT1 N NMR structure docked to GASBOR bead model of DNMT1 N .I, SAXS scattering curve back-calculated from the lowest energy NMR structure (red) overlaid to experimental scattering data (black).Goodness of fit χ 2 is indicated.DNMT1, DNA methyltransferase 1; HSQC, heteronuclear single-quantum coherence; NMR, nuclear magnetic resonance; RDC, residual dipolar coupling; SAXS, small-angle X-ray scattering.

Figure 4 .
Figure 4. DNMT1 structure prediction using AlphaFold2 and RoseTTAFold.A, cartoon representation of AlphaFold2-predicted human DNMT1 NL structure color coded according to the per-residue confidence metric pLDDT.B and C, NMR structure of DNMT1 N overlaid to models generated using AlphaFold2 (B) and RoseTTAFold (C).The r.m.s.d.values calculated for the backbone Cα, C, and N atoms of residues Glu22 to Leu90 are indicated.DNMT1, DNA methyltransferase 1; NMR, nuclear magnetic resonance.