Structural Basis of the Versatile DNA Recognition Ability of the Methyl-CpG Binding Domain of Methyl-CpG Binding Domain Protein 4*

Background: Methyl-CpG binding domain 4 (MBD4) is a DNA glycosylase that excises mismatched bases generated in methylated CpG sequences. Results: We report the biochemical and structural properties of the methyl-CpG binding domain of MBD4 (MBDMBD4). Conclusion: MBDMBD4 recognizes a wide range of 5-methylcytosine modifications via an extensive hydration network. Significance: This study provides new insight into the structural mechanism of the broad base recognition that is unique to MBDMBD4. The methyl-CpG binding domain (MBD) protein MBD4 participates in DNA repair as a glycosylase that excises mismatched thymine bases in CpG sites and also functions in transcriptional repression. Unlike other MBD proteins, MBD4 recognizes not only methylated CpG dinucleotides (5mCG/5mCG) but also T/G mismatched sites generated by spontaneous deamination of 5-methylcytosine (5mCG/TG). The glycosylase activity of MBD4 is also implicated in active DNA demethylation initiated by the deaminase-catalyzed conversion of 5-methylcytosine to thymine. Here, we report the crystal structures of the MBD of MBD4 (MBDMBD4) complexed with 5mCG/5mCG and 5mCG/TG. The crystal structures show that the DNA interface of MBD4 has flexible structural features and harbors an extensive water network that supports its dual base specificities. Combined with the results of biochemical analyses, the crystal structure of MBD4 bound to 5-hydroxymethylcytosine further demonstrates that MBDMBD4 is able to recognize a wide range of 5-methylcytosine modifications through the unique water network. The versatile base recognition ability of MBDMBD4 implies multifunctional roles for MBD4 in the regulation of dynamic DNA methylation patterns coupled with deamination and/or oxidation of 5-methylcytosine.

The methyl-CpG binding domain (MBD) protein MBD4 participates in DNA repair as a glycosylase that excises mismatched thymine bases in CpG sites and also functions in transcriptional repression. Unlike other MBD proteins, MBD4 recognizes not only methylated CpG dinucleotides ( 5m CG/ 5m CG) but also T/G mismatched sites generated by spontaneous deamination of 5-methylcytosine ( 5m CG/TG). The glycosylase activity of MBD4 is also implicated in active DNA demethylation initiated by the deaminase-catalyzed conversion of 5-methylcytosine to thymine. Here, we report the crystal structures of the MBD of MBD4 (MBD MBD4 ) complexed with 5m CG/ 5m CG and 5m CG/TG. The crystal structures show that the DNA interface of MBD4 has flexible structural features and harbors an extensive water network that supports its dual base specificities. Combined with the results of biochemical analyses, the crystal structure of MBD4 bound to 5-hydroxymethylcytosine further demonstrates that MBD MBD4 is able to recognize a wide range of 5-methylcytosine modifications through the unique water network. The versatile base recognition ability of MBD MBD4 implies multifunctional roles for MBD4 in the regulation of dynamic DNA methylation patterns coupled with deamination and/or oxidation of 5-methylcytosine.
DNA methylation is the most prominent epigenetic modification in higher eukaryotic genomes (1,2). In mammals, DNA methylation mainly occurs at the C5 position of symmetrically arranged cytosines in CpG dinucleotides, and plays essential roles in various cellular events such as gene repression, imprinting, X-chromosome inactivation, suppression of repetitive genomic elements, and carcinogenesis (3). Recent studies have shown that DNA methylation can be actively reversed and that its pattern is dynamically altered in mammalian cells (4 -7). Although the underlying molecular mechanism is not fully understood, active DNA demethylation has been proposed to involve further oxidation or deamination of 5-methylcytosine ( 5m C) 3 followed by base excision repair (6 -13). Successive oxidation of 5m C to 5-hydroxymethylcytosine ( hm C), 5-formylcytosine ( fo C), and 5-carboxylcytosine ( ca C) is catalyzed by TET proteins and, has attracted much attention as a crucial process in DNA demethylation. Furthermore, demethylation pathways are thought to involve the spontaneous or enzymatic deamination of 5m C or hm C and subsequent base excision repair of the mismatched thymine or 5-hydroxymethyluracil ( hm U) base (6,9,(11)(12)(13). Therefore, precise interpretation and regulation of the modification status of 5m C are required for various epigenetic events in cells.
MBD (methyl-CpG binding domain) proteins are archetypal mediators of DNA methylation marks. They recognize methyl-CpG sites ( 5m CG/ 5m CG) through a conserved MBD and recruit transcriptional repressors or chromatin modifiers to these sites (14). One of the MBD family proteins, MBD4 contains a C-terminal DNA glycosylase domain in addition to an N-terminal MBD domain. MBD4 is involved in DNA mismatch repair as a T/G or U/G mismatch glycosylase and also in transcriptional repression via its recruitment of Sin3A and HDAC1 (15,16).
The glycosylase activity of MBD4 specifically excises a mismatched thymine or hm U base generated by the deamination of 5m C or hm C in a CpG site; thus, MBD4 is thought to participate in both DNA repair in the context of CpG and DNA demethylation (14). The functional importance of MBD4 in maintaining genomic integrity has been demonstrated by an increased frequency of C to T transitions at CpG sites in MBD4 Ϫ/Ϫ mice (17) and the finding that frequent MBD4 mutations in various human carcinomas are characterized by microsatellite instability (18). Moreover, MBD4 contributes to the stimuli-dependent active DNA demethylation of specific genomic loci together with thymine DNA glycosylase (TDG) (19).
Previous structural studies of MBD1, MBD2, and MeCP2 demonstrated how MBDs recognize only 5m CG/ 5m CG sites (20 -22). However, in addition to the fully methylated CpG, the MBD domain of MBD4 binds to T/G mismatched base pairs that result from asymmetrical 5m C deamination of 5m CG/ 5m CG dinucleotides (16). Recent structural and biochemical studies of the glycosylase domain of MBD4 suggest that the specificity of full-length MBD4 for 5m CG/TG is provided by MBD MBD4 (23)(24)(25). The glycosylase domain recognizes the mismatched thymine or hm U base but not the adjacent 5m C/G base pair. Thus, the recognition of methylated DNA by MBD MBD4 appears to be indispensable for the multifunctional roles of MBD4 in the regulation and maintenance of DNA methylation patterns.
Here, we present the crystal structures of the MBD of MBD4 (MBD MBD4 ) in complex with a DNA fragment containing the 5m CG/ 5m CG site or its deamination product, 5m CG/TG. The structures reveal the unique flexible DNA interface of MBD MBD4 accompanied by an extensive water network. Our structural and biochemical data demonstrate that, in addition to 5m CG/ 5m CG and 5m CG/TG, the DNA interface of MBD MBD4 is able to accommodate hm C and its further oxidation or deamination products. We also determined the crystal structure of MBD MBD4 bound to a methylated CpG site containing hm C ( 5m CG/ hm CG) and found that the water network at the DNA interface of MBD MBD4 can be finely tuned to accommodate various modified pyrimidine rings. Our structural and biochemical studies indicate the molecular basis of the broad base recognition ability of MBD MBD4 , which underlies DNA methylation and gene regulation involving MBD4.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-A DNA fragment encoding MBD MBD4 (residues 69 -136) was amplified by PCR and cloned into the bacterial expression vector pGEX4T-3 (GE Healthcare Biosciences), which was engineered for the expression of recombinant proteins with an N-terminal tandem fusion tag of glutathione S-transferase (GST) and small ubiquitin-like modifier-1 (SUMO-1). The GST-SUMO-1-MBD MBD4 fusion was overexpressed in Escherichia coli strain BL21(DE3). Cells were grown at 37°C in Luria-Bertani (LB) medium containing 50 g/ml of ampicillin, to an optical density of 0.5-0.6 at 660 nm, and then induced with 0.2 mM isopropyl ␤-D-thiogalactoside for 15 h at 18°C. Cells were harvested by centrifugation, and lysed by sonication in 50 mM Tris-HCl, pH 8.0, buffer containing 300 mM NaCl, 1 mM dithiothreitol (DTT), 5% glycerol, 0.1% Triton X-100, and 1 mM phenylmethylsulfonyl fluoride. The clarified lysate was loaded onto glutathione-Sepharose 4 Fast Flow beads (GE Healthcare). GST-SUMO-1fused MBD MBD4 was eluted from the beads with elution buffer containing 10 mM glutathione. The tag-free MBD MBD4 was prepared by SENP2 protease treatment, and was further purified by sequential column chromatography steps using HiTrap Heparin HP and HiLoad 16/60 Superdex 75 columns (GE Healthcare). Purified protein in the final elution buffer containing 10 mM Hepes-NaOH, pH 7.4, 150 mM NaCl, and 2 mM DTT was concentrated using an Amicon Ultra 3,000 cut-off membrane concentrator (Millipore). To introduce selenomethionine, Leu-116 of MBD4 was substituted with methionine. The selenomethionine containing MBD MBD4 was expressed in modified M9 medium (26). Purification of the selenomethionine-labeled L116M mutant was performed following the same procedure as that for the native protein.
Crystallization, Data Collection, and Structure Determination-MBD MBD4 at a concentration of 200 -800 M was mixed with each DNA fragment at a 1:1 molar ratio. Crystals of MBD MBD4 were obtained by a vapor diffusion method at 20°C using PEG 10,000 or PEG 1500 as the precipitant. Details of crystallization conditions are listed in Table 1. MBD MBD4 bound to 14-and 11-bp oligomers containing 5m CG/TG were crystallized in orthorhombic C222 1 and triclinic P1 forms, respectively. In the orthorhombic form, a complex of one protein and DNA is contained in an asymmetric unit, whereas the triclinic form comprises two protein molecules and one DNA oligomer. The complex of MBD MBD4 with 14-bp oligomer containing 5m CG/ 5m CG or 5m CG/ hm CG was crystallized in a C222 1 form. All crystals were flash frozen at 100 K in cryoprotectant containing 20% ethylene glycol. X-ray diffraction data sets were collected at a wavelength of 1.0000 Å on beamlines BL-5A, BL-17A, NE3A, and NW12 at Photon Factory (Tsukuba, Japan) and beamline BL-38 at SPring8 (Harima, Japan), and were processed with the program HKL2000 (27). The phases of the selenomethionine derivative MBD MBD4 L116M complexed with the 14-bp 5m CG/TG fragment were determined by the single wavelength anomalous dispersion method using the programs SOLVE and RESOLVE (28,29). The initial model was built using the COOT program (30) and was refined against the native data using the PHENIX suite (31), thus yielding a crystallographic R factor of 18.8% and a free R factor of 22.4% to 2.0 Å. The triclinic form structure of the MBD MBD4 -5m CG/TG complex and the structures of the MBD MBD4 -5m CG/ 5m CG and MBD MBD4 -5m CG/ hm CG complexes were solved by a molecular replacement method using the orthorhombic form structure of MBD MBD4 -5m CG/TG as the search model. The stereochemical quality of the final models was assessed using MolProbity (32). The sequence information of DNA fragments used for crystallization is summarized in Table 2. The crystallographic data, data collection statistics, and refinement statistics are summarized in Table 1. All structural figures were produced using PyMOL (43).
DNA Binding Assays-Isothermal titration calorimetry (ITC) measurements were performed on an iTC200 microcalorimeter (MicroCal, USA) at 25°C. The protein solution was dialyzed to the ITC measurement buffer of 25 mM Hepes-NaOH, pH 7.4, containing 100 mM NaCl and 0.1 mM Tris(2-carboxy-ethyl)phosphine. Each annealed DNA duplex was dried and dissolved in ITC buffer. The DNA solution (10 -20 M) in a calorimetric cell was titrated with a 100 -400 M protein solution. Binding constants were calculated by fitting the data using the ITC data analysis module of Origin 7.0 (OriginLab). Competitive binding assays were also performed in the ITC buffer. The upper strand of the 14-bp 5m CG/ 5m CG DNA fragment was radioisotope labeled at the 5Ј end with T4 polynucleotide kinase (TOYOBO, Japan) and [␥-32 P]ATP (Muromachi Kagaku, Tokyo). The labeled strand was then mixed with a 1.2fold amount of the complementary strand and annealed. The radioisotope-labeled 5m CG/ 5m CG fragment and MBD protein were mixed at concentrations of 1 and 3 M, respectively. Subsequently, 0, 1, 2, or 4 M nonlabeled competitor DNA fragment was added, and the solution was incubated for 30 min at 4°C and analyzed using native gel electrophoresis. The DNA bands were visualized with a Fuji BAS-2000 phosphorimager. The DNA content of each band was quantified from the gel band density as a relative amount compared with total input DNA. A series of relative values were normalized against the control lane and plotted against the amount of competitor DNA. Curves for each experiment were fitted by a nonlinear, least square method using Morrison's equation (33).
Glycosylase Assays-Glycosylase assays were performed in a 10-l reaction mixture containing 10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA, and 0.1 mg/ml of BSA. The synthetic oligonucleotides, GGCTAAATACCTGGGCTXGAAGTGAACTGATT-GCC, where X indicates T, hm U, ca C, 5m C, hm C, or fo C, was labeled by T4 polynucleotide kinase and [␥-32 P]ATP, and annealed with complementary strand containing 5m C at the central CpG site. Each of the 32 P radioisotope-labeled DNA duplexes at 40 or 400 nM were incubated with 200 or 2000 nM human TDG or mouse MBD4 at 37°C for 1 h. Reactions were terminated by the addition of 10 l of a reaction stop solution containing 0.2 M NaOH and 20 mM EDTA followed by incubation at 95°C for 10 min. For the reaction using the fo C-containing oligonucleotide, the incubation was carried out at 70°C for 5 min to avoid the digestion under the alkaline conditions regardless of enzymatic activity. After addition of 60 l of 10 M urea, 20 l of each sample was subjected to electrophoresis in a 9 M urea, 20% PAGE and visualized with a Fuji BAS-2000 phosphorimager.
Accession Codes-Coordinates and structure factors have been deposited in the Protein Data Bank (PDB) under accession codes 3VXV, 3VXX, 3VYB, and 3VYQ.

RESULTS
Dual Binding Specificity of MBD MBD4 for 5m CG/ 5m CG and 5m CG/TG Sites-The DNA binding properties of mouse MBD MBD4 were examined quantitatively by ITC measurements using 14-bp double-stranded DNA oligomers containing a single CpG site in various modification or mismatch states (Tables 2 and 3). In agreement with previous reports (16), MBD MBD4 was tightly bound to 5m CG/TG with a dissociation constant (K D ) of 98.8 nM, but did not interact with nonmethylated CG/CG or hemimethylated 5m CG/CG. The affinity of MBD MBD4 for the 5m CG/ 5m CG site (K D , 97.5 nM) was comparable with that for the 5m CG/TG site. However, MBD MBD1 exhibited a 5-fold greater affinity for the 5m CG/ 5m CG site (K D , 72.5 nM) over the 5m CG/TG site (K D , 458 nM). The binding where the free reflections (5% of the total used) were held aside for R free throughout refinement. specificities of MBD MBD4 and MBD MBD1 were also assessed by a competitive electrophoretic mobility shift assay (EMSA) in which the 32 P-labeled 5m CG/ 5m CG oligomer bound to MBD competed with nonlabeled fragments. Both nonlabeled 5m CG/TG and 5m CG/ 5m CG fragments efficiently competed with the 5m CG/ 5m CG oligomer for binding to MBD MBD4 , whereas the nonlabeled 5m CG/TG fragment did not abrogate the interaction between MBD MBD1 and the 5m CG/ 5m CG site (Fig. 1). Thus, MBD MBD4 is characterized as a unique MBD family protein based on its dual DNA binding ability, although the key residues for recognition of methylated CpG sites are almost completely conserved among MBD MBD1 , MBD MBD2 , and MBD MeCP2 (Fig. 2A).
Crystal Structures of MBD MBD4 Complexed with Methylated CpG and Its Deamination Product-To understand the structural basis of the unique DNA binding properties of MBD4, we solved the crystal structure of MBD MBD4 in complex with DNA fragments containing 5m CG/TG or 5m CG/ 5m CG sequences ( Table 2). The crystal structures of the MBD MBD4 -5m CG/TG complex were determined in orthorhombinc C222 1 and triclinic P1 forms (Fig. 2, B and C, Table 1). In the C222 1 form, the C-terminal parts of MBD MBD4 (residues 121-136) were exchanged between 2-fold symmetry-related molecules, resulting in a swapped dimer linked through a disulfide bond (Fig.  2B). However, dimer formation was not observed in our gelfiltration experiments (data not shown); therefore, the swapped dimer is interpreted as a crystallographic artifact. The C222 1 form structure shares the core folding and the DNA binding interface with the P1 form despite the C-terminal swapping. In this report, the recognition of 5m CG/TG by MBD MBD4 is discussed based on the higher resolution structure of the C222 1 form. The DNA strand containing a mismatched T is hereafter referred to as the "lower strand," whereas the other strand is termed the "upper strand" (Fig. 2D).
Similar to other MBD family proteins, MBD MBD4 has an overall fold consisting of one ␣-helix (␣1) and three ␤-strands (␤1-3) (Fig. 2D) (20 -22). The overall structures of MBD MBD4 and MBD MeCP2 are well superimposed with root mean square deviations of 1.59 Å for 54 equivalent C␣ atoms. The T/G mismatch DNA fragment bound to MBD MBD4 adopts the canonical B-form conformation in both crystal structures. The 5m CG/TG site is recognized by MBD MBD4 in a major groove as previously observed in other MBD-5m CG/ 5m CG complexes ( Fig. 2D) (20 -22). Phosphate backbone recognition is also conserved among MBD family members. The positive end of the helix dipole from the ␣1 helix is placed in the major groove and capped by a phosphate group from the DNA backbone. Residues 85-89 in the L1 loop, which connect ␤1 and ␤2, also assist in holding the phosphate backbone via making extensive electrostatic contacts (Fig. 2D).
In the triclinic crystal structure, one of the MBD MBD4 molecules in an asymmetric unit binds to the 5m CG/TG site in a conserved manner, whereas the other protein molecule interacts with a joint region between two neighboring DNA fragments that are continuously linked through base stacking interactions (Fig. 2C). As described below, the latter protein-DNA interaction suggests a possible mode of nonspecific DNA binding for MBD4. MBD MBD4 complexed with 5m CG/ 5m CG was crystallized in the C222 1 form, and its structure was determined at 2.2 Å resolution ( Table 1). The overall structure of the 5m CG/ 5m CG complex is almost identical to that observed in the orthorhombic crystal of the 5m CG/TG complex (root mean square deviation: 0.16 Å for 57 equivalent C␣ atoms).
MBD MBD4 Recognizes the 5m CG/ 5m CG Sequence via Conserved Arginine Fingers-The overall 5m CG/ 5m CG recognition mode of MBD MBD4 is essentially analogous to that of MBD MeCP2 , MBD MBD1 , and MBD MBD2 . Arg-84 and Arg-106, which are completely conserved in the MBD family ( Fig. 2A), recognize symmetrically arranged guanine bases in the 5m CG/ 5m CG sequence in a manner similar to that of other MBD proteins (Fig. 2D). The Arg-84 and Arg-106 residues are hereafter termed Arg finger-1 and -2, respectively. A guanidino group of Arg finger-1 donates hydrogen bonds to the O6 and N7 atoms of the guanine base in the lower DNA strand, whereas Arg finger-2 recognizes the guanine base in the upper strand via an analogous hydrogen bonding pattern (Fig. 3, A and B). The aliphatic side chains of each arginine finger make van der Waals contacts with the 5-methyl group of the 5m C base adjacent to its  interacting guanine (Fig. 3, C and D). Additionally, the main chain carbonyl group of Arg finger-2 forms a CHO hydrogen bond (3.8 Å) with the 5-methyl group of the 5m C base in the upper strand (Fig. 3D).
In MBD MeCP2 , the positions of both Arg fingers are stabilized through interactions with the conserved acidic residues (Fig. 3, E and F) (21). Similarly, the orientation of Arg finger-1 of MBD MBD4 is defined by its intramolecular interaction with a conserved acidic residue, Asp-94. The side chain carboxyl group of Asp-94 forms salt bridges with the guanidino group of Arg finger-1, resulting in an arginine side chain conformation suitable for recognition of the 5m CG sequence (Fig. 3A). Asp-94 also forms a CHO hydrogen bond (3.9 Å) with the 5-methyl group of the 5m C base in the lower strand (Fig. 3C). In contrast, Arg finger-2 lacks such an intramolecular lock because Glu-137 in MeCP2 is replaced by Ser-110 in MBD4 ( Fig. 2A). Nevertheless, the position of Arg finger-2 bound to 5m CG/ 5m CG shows good superimposition with that in the MBD MeCP2 -5m CG/ 5m CG complex (Fig. 3, B and F).
The Water Network in the MBD MBD4 -DNA Interface-The most significant structural difference between MBD MBD4 and other MBD proteins is the orientation of the conserved tyrosine residue, Tyr-96, located on the DNA binding surface (Fig. 3B). The corresponding tyrosine residues of MBD MeCP2 , MBD MBD1 , and MBD MBD2 are oriented toward the 5m C base in the lower strand through hydrophobic interactions with the aliphatic side chains of their surrounding residues (20 -22). In the crystal structure of the MBD MeCP2 -DNA complex, the side chain of the corresponding residue, Tyr-123, recognizes the 5m C base via two water-mediated interactions (Fig. 3F) (21). Previous mutational analysis of MBD1 and MeCP2 suggested that the conserved Tyr residue is critical for DNA binding (20,21). The side chain of Tyr-96 in MBD MBD4 is flipped out of the DNA interface and makes water-mediated interactions with the phosphate backbone of the lower DNA strand (Fig. 3B). The aromatic side chain is stabilized by a stacking interaction with the compact hydrophobic side chain of Val-80 (Fig. 4). Notably, despite the absence of the tyrosine hydroxyl group at the common position, the MBD MBD4 -DNA interface retains the hydration water molecules involved in the recognition of the lower strand 5m C (Fig. 3, B and F). The water molecules, W1, W2, and W3, form van der Waals interactions with the 5-methyl group of the lower strand 5m C in a similar manner to that observed in the MBD MeCP2 -DNA complex. Coordination of the three other water molecules (W4, W5, and W6 in the MBD MBD4 -DNA complex) surrounding the upper strand 5m C base is also conserved in the MBD MBD4 -DNA and MBD MeCP2 -DNA complexes (Fig. 3, A and E). MBD MBD4 and MBD MeCP2 share the recognition scheme for the upper strand 5m C base involving a water molecule (W4 in the MBD4-5m CG/ 5m CG complex or W4Ј in the MeCP2-DNA complex) that bridges the N4 atom of the base with the carboxyl group of the conserved Asp residue.
In the vacant space generated by the flipping of Tyr-96, the molecular water network is further extended at the MBD MBD4 -DNA interface. For example, W1 forms a hydrogen-bonding network with other surrounding water molecules (Figs. 3B and 4), whereas its counterpart in the MBD MeCP2 -DNA complex, W1Ј in Fig. 3F, is fixed by hydrogen bonds with the Tyr-123 and An ϳ1.5-fold excess of the nonlabeled 5m CG/ 5m CG fragment was required to obtain the same competitive effect as with nonlabeled 5m CG/TG, indicating that MBD MBD4 has a similar affinity for 5m CG/TG and 5m CG/ 5m CG sites.  Arg-133 residues of the protein. The hydrogen-bonding network within the DNA interface of MBD MBD4 is also maintained through water-mediated interactions between the phosphate groups of the DNA backbone and the side chains of Asp-94 and Lys-104 (Fig. 4). Thus, the DNA interface of MBD MBD4 contains more open space filled with ordered water molecules in comparison with other MBDs.
Recognition of 5m CG/TG by the Flexible DNA Binding Surface of MBD MBD4 -The hydrogen bonding pattern of the T/G mismatched base pair in the MBD MBD4 -5m CG/TG complex is identical to that observed in the crystal structure of a DNA oligomer with a T/G mismatch (PDB entry 113D) (34). The T/G mismatch still allows two hydrogen bonds to form between the bases, thus creating an overall shape similar to that in Watson-Crick base pairing. However, the mismatched thymine base is shifted 1-2 Å toward the major groove side of the DNA duplex (Fig. 5A). The base stacking interactions with neighboring pairs are unaffected by the mismatched pair (35), and the entire DNA binding mode common to MBDs is retained in the MBD MBD4 -5m CG/TG complex.
The guanine base in the T/G mismatch is recognized by Arg finger-2 through a hydrogen-bonding pattern analogous to that observed in the MBD MBD4 -5m CG/ 5m CG complex (Fig. 5, A and  B). However, in comparison with the 5m CG/ 5m CG complex, the side chain of Arg finger-2 is shifted by ϳ0.8 Å to form an additional hydrogen bond with the protruding carbonyl group at the 4th position of the thymine ring (Fig. 5C). Except for the movement of Arg finger-2, there are no significant differences between the protein structures in the 5m CG/ 5m CG and 5m CG/TG complexes. The 5-methyl group of the thymine base is recognized via contacts with Arg finger-1, Asp-94, and water molecules as observed for the lower strand 5m C recognition in the 5m CG/ 5m CG complex (Fig. 5B). It is important to note that the water molecules in the protein-DNA interface are rearranged by local conformational changes upon binding to 5m CG/TG or 5m CG/ 5m CG (Fig. 5C).
The 5m C base of 5m CG/TG in the upper strand is also recognized by Arg finger-1 in the manner common to MBDs (Fig.  5B). Arg finger-2 retains the van der Waals contacts with the upper 5m C base via its aliphatic moiety despite its movement.
In contrast to Arg finger-2 of MBD MBD4 , Arg finger-2 of MBD MBD1 or MBD MeCP2 is presumably incapable of recognizing the protruding mismatched base because its side chain is fixed by the interaction with conserved acidic residues (Fig. 3F) (20 -22). Indeed, MBD MBD1 exhibited significantly weaker binding to 5m CG/TG compared with 5m CG/ 5m CG ( Fig. 1; Table  3). Thus, the flexibility of Arg finger-2 provided by the lack of an intra-molecular lock appears to be indispensable for T/G mismatch recognition.
The Nonspecific DNA Binding Mode of MBD MBD4 -The nonspecific DNA binding mode of MBD4 is observed in the crystal structure of the triclinic form of the MBD4-5m CG/TG complex (Fig. 2C). In the nonspecific complex, MBD MBD4 also binds to DNA via the major groove side. The phosphate backbone recognition scheme by the ␣1 helix and L1 loop is essentially identical to that in the specific complex (Fig. 6A).
In the nonspecific complex, the dynamic movement of Arg finger-2 is of great interest; this movement takes place in the vacant space generated by the flipping of Tyr-96. Arg finger-2, which is directed toward the target base in the specific complexes, adopts a completely different conformation to form a hydrogen bond with an atom of the phosphate backbone ( Fig.  6B) and, thereby reinforcing DNA duplex binding. The unique flexibility of Arg finger-2 in MBD4 presumably facilitates nonspecific DNA interaction, which implies a sliding mode prior to target recognition (Fig. 6B). In agreement with the structural observations, MBD MBD4 exhibited more highly significant binding to nonmodified CpG than MBD MBD1 in our electrophoretic mobility shift assay (data not shown).
The DNA Binding Surface of MBD4 Tolerates Binding to Oxidation and Deamination Products of 5m C-The structural features of the protein-DNA interface suggest that MBD MBD4 has the ability to bind to modifications that are more bulky than the methyl group at the 5th position of cytosine. We therefore examined the binding of MBD MBD4 to a methylated CpG fragment containing hm C, hm U, fo C, or ca C (Fig. 7A). In a competitive EMSA, the nonlabeled 5m CG/ hm CG, 5m CG/ hm UG, and 5m CG/ fo CG fragments competed with a 32 P-labeled 5m CG/ 5m CG duplex for binding to MBD MBD4 . The affinity of MBD MBD4 for 5m CG/ hm CG, 5m CG/ hm UG, and 5m CG/ fo CG was estimated to be ϳ2or 3-fold weaker than its affinity for 5m CG/ 5m CG based on the data from the competitive EMSA and ITC binding assays (Fig. 7, B and D, and Table 3). However, the 5m CG/ ca CG and hm CG/ hm CG fragments exhibited weaker binding to MBD MBD4 than the other modified nucleotides (Fig.  7B). In contrast, MBD MBD1 exhibited a tight specificity for 5m CG/ 5m CG (Fig. 7, C and E). The affinity of MBD MBD1 for 5m CG/ hm CG (K D , 1.04 M) was more than 10-fold weaker than that for 5m CG/ 5m CG (K D , 72.5 nM) ( Table 3). Combined with the structural data, these findings suggest that MBD MBD4 is capable of binding to methylated CpG sequences that have undergone further asymmetric oxidative modification.
To achieve a better understanding of the structural basis of the versatile DNA binding ability of MBD MBD4 , we determined its crystal structure at 2.4-Å resolution when bound to a 5m CG/ hm CG fragment ( Table 1). Hydroxylation of the 5-methyl group of 5m C does not perturb either the canonical hydrogen bonding pattern in the C/G base pair or the overall DNA binding mode of MBD MBD4 (Fig. 8A). An unambiguous electron density for the hydroxyl group of hm C suggests a confined rotational movement of the 5-hydroxymethyl moiety against the pyrimidine ring (Fig. 8A); intriguingly, the hydroxyl group makes an intrabase hydrogen bond with the amino group at the 4th position in addition to a hydrogen bond with a water molecule at the DNA interface. The 5-hydroxymethyl moiety also donates CHO hydrogen bonds to the carbonyl of Asp-94 and the phosphate group of the DNA backbone, which show tetrahedral coordination around the methyl carbon at the 5th position (Fig. 8B). Thus, the positional preference of the hydroxyl group is ensured by the intra-base hydrogen bond and the tetrahedral configuration around the methyl carbon despite the close contacts with the neighboring base on the 5Ј side. The flexible DNA interface of MBD MBD4 is likely to have enough space to accommodate the hm U or fo C base as well as hm C. In contrast, the (asterisk) is shown on both sides of the schematic DNA drawing for convenience. W and Eg represent ordered water molecules and an ethylene glycol molecule, respectively. In the right panel, W4 and W5 correspond to Wat-301 of chain A and Wat-214 of chain B in the MBD MBD4 -5m CG/ 5m CG complex structure (PDB code 3VXX), respectively. C, comparison of the orientation of Arg finger-2 and the hydration water molecule network in the protein-DNA interfaces of the different complexes. Ribbon presentation represents the structure of the 5m CG/ 5m CG complex. D, model of MBD MBD4 bound to the 5m CG/TG sequence in the direction opposite to that observed in the crystal. The crystal structure of MBD MBD4 is shown as a green ribbon representation with a green stick model of Arg finger-2. The model in the reverse binding mode is shown in blue. A guanidino group of Arg finger-1 in the model structure is overlaid onto that of Arg finger-2 in the crystal structure. The model suggests steric hindrance between the side chain of Asp-94 and the mismatched thymine base in the reverse binding mode.  relatively low affinity of MBD MBD4 for 5m CG/ ca CG is presumably caused by electrostatic repulsion between the 5-carboxyl group of the base and the side chain carboxyl of Asp-94.

DISCUSSION
The crystal structures of MBD MBD4 complexed with 5m CG/ TG, 5m CG/ 5m CG, and 5m CG/ hm CG provide new insight into the structural mechanism of the versatility of base recognition by MBD4. The broad base specificity of MBD MBD4 is implicated in heterochromatin localization and enzymatic activity of MBD4 associated with methylated DNA regions. In contrast to MBD MBD1 , MBD MBD4 binds not only 5m CG/ 5m CG but also various modified pyrimidine rings including deamination and/or oxidation products of the 5m C base, such as 5m CG/TG, 5m CG/ hm CG, 5m CG/ hm UG, and 5m CG/ fo CG, in a methylated CpG site. MBD MBD4 shares an overall DNA recognition mode with other MBDs. The important role of the water molecules in target base recognition is highlighted by their conserved positions in the MBD MBD4 -5m CG/ 5m CG and MBD MeCP2 -5m CG/ 5m CG complexes (Fig. 3, A, B, E, and F) (21). However, local structural differences between MBD MBD4 and other MBDs have a large impact on the DNA binding properties of MBD MBD4 . In particular, the structural features unique to MBD MBD4 around the conserved Tyr-96 and the Arg finger-2 provide plasticity in the DNA binding surface and allow versatile base recognition (Fig.   5C). As a consequence of the flipped Tyr-96 side chain of MBD MBD4 , a more extensive water molecule network is established in its DNA interface compared with the MBD MeCP2 -DNA surface. Intriguingly, the hydration water molecules responsible for the base recognition (W1, W2, and W3) are maintained at appropriate positions through the solvent network rather than through interactions with protein residues (Fig. 4). A comparison of the water structures around the lower strand target bases ( 5m C, mismatched T and hm C) highlighted the plasticity in the arrangement of the ordered water molecules in the DNA interface, in which the water-mediated hydrogen-bonding network of MBD MBD4 is finely tuned to accommodate each of the modified bases.
Compared with the lower strand target base recognition, the interface with the upper strand 5m C more strictly maintains the structural features conserved in other MBDs including the conformation of Arg finger-1, which is fixed by the aspartic acid and hydration water structure (Fig. 3, A and E). This structural feature obviously indicates that the symmetric oxidative modification of both 5m C bases in the CpG sequence perturbs MBD MBD4 binding. In fact, our DNA binding data combined with previously reported data demonstrate that neither MBD4 nor the other MBD proteins are capable of binding to the symmetrical hm CG/ hm CG site (Fig. 7, B and C) (36,37). MBD MBD4 does not make contact with bases other than the CpG sequence and are able to bind to the symmetric 5m CG/ 5m CG site equally in both directions as observed in the flipping motion of MBD MBD1 on its target DNA (38). In contrast, the tight recognition of 5m CG by the Arg finger-1 presumably prevents the flipping motion of MBD MBD4 on asymmetric target sequences, such as 5m CG/TG, 5m CG/ hm CG, and 5m CG/ hm UG (Fig. 5D).
Despite the broad spectrum of MBD MBD4 binding targets, full-length MBD4 exhibits glycosylase activity only toward mismatched thymine and hm U bases ( Fig. 9) (24,39). The oxidative products of 5m C, such as hm C, fo C, and ca C, are not susceptible to digestion by MBD4, whereas TDG excises fo C and ca C (7, 10). These findings indicate a partial functional redundancy and a possible functional difference between MBD4 and TDG (7,10). The glycosylase domain itself exhibits the substrate specificity for T/G or hm U/G mismatched bases regardless of the methylation status of the adjacent C/G base pair (23,25). Therefore, the DNA binding of MBD MBD4 is presumably a prerequisite for the intrinsic glycosylase activity of MBD4 toward the mismatched bases generated in methylated CpG sites. Addition-FIGURE 8. Structure of MBD MBD4 bound to 5m CG/ hm CG. A, structure of the DNA interface in the MBD MBD4 -5m CG/ hm CG complex. The structure of the hm CG binding site is shown in the same orientation as Fig. 3B. The mF o Ϫ DF c simulated annealing omit map (Ͼ3.0 ) for the hydroxyl group of the hm C base is shown as magenta mesh. Water molecules are represented as small red spheres. Black dotted lines indicate hydrogen bonds (Ͻ3.2 Å). W1 and W2 represent the water molecules in the PDB file of the MBD MBD4 -5m CG/ hm CG complex structure (PDB code 3VYB); W1, Wat-102 in chain C; W2, Wat-106 in chain C. B, the tetrahedral configuration around the carbon atom in the 5-hydroxymethyl group. The black, orange, and red dotted lines represent a hydrogen bond, a CHO hydrogen bond, and an unfavorable close contact, respectively. FIGURE 9. Glycosylase activities of MBD4 and TDG. The glycosylase activity of the full-length MBD4 protein for mismatched, deamination, and/or oxidation products in the context of the 5m CG/ 5m CG sequence was assessed by NaOH cleavage of the resulting apyrimidinic site. We observed a significant digestion band for the strand containing either T or hm U in a mismatched wobble base. 5m C, hm C, fo C, and ca C bases, each of which forms canonical Watson-Crick base pairs, were not removed by MBD4, whereas, human TDG exhibited activity toward fo C and ca C in addition to T and hm U bases.
ally, isolated MBD MBD4 has been shown to inhibit the catalytic activity of the glycosylase domain toward a single 5m CG/TG site in vitro (40), suggesting that the DNA substrate is transferred from MBD MBD4 to the glycosylase domain only in full-length MBD4. The unidirectional binding of MBD MBD4 to the 5m CG/TG or 5m CG/ hm UG site could facilitate its synergetic action with the C-terminal glycosylase domain in DNA mismatch repair processes. It remains unclear whether the binding of MBD MBD4 to 5m C, hm C, or fo C targets the glycosylase domain to neighboring 5m CG/TG or 5m CG/ hm UG sites.
Intriguingly, the active DNA demethylation of the p15 ink4b tumor suppressor gene triggered by the TGF-␤/SMAD signaling pathway is accompanied by the accumulation of hm C bases, MBD4, TDG, and downstream base excision repair proteins (19). The versatile base recognition ability of MBD MBD4 demonstrated in our study may contribute to the stimuli-dependent accumulation of MBD4 at hydroxymethylated regions, which leads to erasure of DNA methylation marks. Further investigation of MBD4 protein complexes colocalized to hm C-rich regions will be crucial for fully understanding the functional roles of MBD4 in DNA demethylation pathways. Furthermore, recent studies have indicated that the hm C, fo C, and ca C bases have long lifetimes during preimplantation development (41,42); thus they may function as bona fide epigenetic marks antagonistic to 5m C bases in vivo. MBD4 may recognize these bases independently of its glycosylase activity and act as a mediator via its multifunctional capabilities, although further investigation is necessary to fully understand the role of MBD4 in the biology of oxidized cytosine bases.