DNA Recognition by the Methyl-CpG Binding Domain of MeCP2*

The methyl-CpG binding domain (MBD) of the transcriptional repressor MeCP2 has been proposed to recognize a single symmetrically methylated CpG base pair via hydrophobic patches on an otherwise positively charged DNA binding surface. We have tested this binding model by analysis of mutant derivatives of the MeCP2 MBD in electrophoretic mobility shift assays complemented by NMR structural analysis. Exposed arginine side chains on the binding face, in particular Arg-111, were found to be critical for binding. Arg-111 was found to interact with the conserved aspartate side chain Asp-121, which is proposed to orientate the arginine side chain to allow specific contacts with the DNA. The conformational flexibility of the disordered B-C loop region, which forms part of the binding face, was also shown to be important. In contrast, mutation of the exposed hydrophobic side chains had a less severe effect on DNA binding. This suggests that the Arg-111 side chain may contribute to sequence-specific recognition of the CpG site rather than simply making nonspecific contacts with the phosphate backbone. The majority of missense mutations within the MBD found in the human genetic disorder Rett syndrome were shown or predicted to affect folding of the domain rather than the DNA recognition event directly.

MeCP2 is necessary and sufficient for DNA binding in vitro and allows MeCP2 to recognize a single symmetrically methylated CpG dinucleotide in diverse sequence contexts (2). Of the other MBD-containing proteins, MBD1 and MBD2 also bind to single symmetrically methylated CpG dinucleotides in vitro and act as transcriptional repressors (3,4). Repression by MeCP2 can act at a distance from a promoter and involves the targeting of histone deacetylases via a C-terminal transcription repression domain, which interacts with the corepressor mSin3 (5), although a deacetylase-independent mechanism may also contribute to repression (5)(6)(7). Targeting of histone deacetylases is probably also important in transcriptional repression by MBD1 and MBD2 (3,4). Mammalian MBD3 is found in a histone deacetylase complex, the Mi-2 or NuRD complex (8), but does not bind specifically to methylated DNA in vitro and is not localized to methylated chromosomal regions in vivo (1). MBD4 is probably a methyl-CpG-TpG mismatch-specific glycosylase (9).
Recently, an indication of the role of MeCP2 in human development has been provided by the linking of mutations in the MeCP2 gene to the human neurodevelopmental disorder Rett syndrome (10). This disorder is a childhood-onset regressive disease that causes loss of speech and hand movement, coupled with autistic behavior, microencephaly, and growth retardation and affects one in every 10,000 -15,000 live female births (11,12). Mutations found in the DNA of Rett syndrome patients include frameshift mutations generating truncations within the MBD, the transcription repression domain, or the C terminus of MeCP2, nonsense mutations preceding and immediately downstream of the MBD or within the transcription repression domain, or most interestingly, missense mutations within the MBD and transcription repression domain domains (10,(13)(14)(15)(16)(17)(18). It is of particular interest to determine what effects this latter class of mutations might have on the ability of the MeCP2 protein to carry out its functions of methylated DNA binding and transcriptional repression.
We have recently solved the solution structure of the MeCP2 MBD by NMR spectroscopy (19). The domain has a novel fold that forms a wedge-shaped structure, comprising an N-terminal four-stranded anti-parallel ␤-sheet on one face of the wedge and a C-terminal ␣-helix and small turn on the other face. At the thin end of the wedge, a large, flexible loop region (the B-C loop) between the second and third ␤-strands appears, together with the exposed faces of the second, third, and fourth ␤-strands (strands B, C, and D), to form a binding surface for methylated DNA. Based on this information and the surface of the DNA binding region mapped by NMR studies, a model was proposed in which a conserved solvent-accessible hydrophobic patch composed of the side chains of Tyr-123, Ile-125, and possibly Ala-131 might recognize the methyl groups of the methylated cytosine residues in the major groove (19). A second NMR study on the MBD of the related repressor MBD1 (20) revealed a highly similar fold, suggesting that all the MBDcontaining proteins are likely to recognize DNA in an equivalent manner. These authors also proposed a similar model for DNA recognition and confirmed that mutations in the conserved tyrosine and charged side chains within the binding face affected methyl-DNA recognition in mobility shift assays.
In this study, we have employed rationally designed sitedirected mutant proteins in quantitative mobility shifts to test more thoroughly the model for DNA recognition by the MBD domain of MeCP2. In particular, we have defined the critical role of surface-exposed residues and the large flexible loop on the DNA binding face of the protein in recognition of methylated DNA. In contrast, we find that the conserved hydrophobic side chains in the binding face play a less crucial role in the interaction. We have also investigated the effects that certain missense mutations within the MeCP2 MBD associated with Rett syndrome have on the folding of the domain and on DNA binding and have used our structural information on the MBD to account for the phenotypic effects of a number of such mutations. This leads us to propose that the majority of missense mutations within the MeCP2 MBD associated with Rett syndrome affect the ability of the domain to interact with methylated CpG dinucleotides via a disruption of the domain fold rather than direct mutation of side chains involved in DNA recognition.

EXPERIMENTAL PROCEDURES
Cloning and Site-directed Mutagenesis-The MeCP2 MBD was amplified from the plasmid pET6HMBD (21) using the primers 5Ј-CT-AGCAGCCATGGCCTCTGCTTCTCCCAAACA-3Ј and 5Ј-CTAGCAGG-ATCCTTACCCTCTCCCAGTTACAGTGA-3Ј, and the resulting amplification product was digested with NcoI and BamHI and cloned into NcoI-BamHI-digested pET6H to generate pAFB105. This plasmid therefore encodes residues 77-164 of MeCP2 with the heterologous N-terminal sequence MHHHHHHAM. The portion of MeCP2 encoded is that previously defined as the minimum required for recognition of methyl-CpG (2) and is identical to that used for determination of the NMR structure of the MBD (19) except that it lacks the heterologous, unstructured C-terminal residues GSGC. To generate site-directed mutants of the MBD, the QuikChange kit (Stratagene) was employed with 20 -25-mer primers and pAFB105 as a template, according to the manufacturer's instructions.
Protein Expression-An overnight culture of a fresh transformant of Escherichia coli BL21 (DE3)/pLysS was diluted 1:100 into 1 liter of LB medium and grown at 37°C to an A 600 of 0.5, whereupon the cells were induced with 0.5 mM isopropyl-␤-D-thiogalactoside and grown for 3 h more. Cells were harvested, resuspended in 50 mM HEPES, 0.1 M NaCl, harvested again, and resuspended in binding buffer (5 mM imidazole, 20 mM Tris-HCl, pH 8.0, 0.25 M NaCl, 10% glycerol, 0.1% Triton X-100, 10 mM ␤-mercaptoethanol). Cells were lysed by sonication, and debris was removed by centrifugation at 16,000 rpm for 20 min. Clarified supernatants were loaded onto 10 ml nickel-nitrilotriacetic acid Superflowagarose columns (Qiagen) equilibrated with binding buffer, washed with 2 ϫ 10 ml binding buffer plus 30 mM imidazole, and eluted with 5 ml of binding buffer plus 0.5 M imidazole. Elution fractions were loaded onto 10 ml Fractogel EMD SO 3 2Ϫ 650(M) columns (Merck), washed with 2 ϫ 10 ml binding buffer, and step-eluted with 5 ml of binding buffer plus 0.25 M NaCl followed by 5 ml of binding buffer plus 0.5 M NaCl; the majority of the protein eluted in the first of these fractions. The pooled eluates were dialyzed against 2 liters of 100 mM NaCl, 10% glycerol, 0.1% Triton X-100, and the resulting material was Ͼ95% pure as judged by SDS-polyacrylamide gel electrophoresis. Proteins were quantified by the Bradford assay kit (Bio-Rad). For the production of labeled protein for NMR spectroscopy, cells were grown instead in M9 medium supplemented with 0.4% (w/v) glucose and 1.5 g/liter [ 15 N](NH 4 ) 2 SO 4 and induced as above. Protein from the harvested cells was purified as above, except that binding buffer lacking both Triton X-100 and imidazole was used. The eluates from the Fractogel column were bufferexchanged into 50 mM sodium phosphate, pH 6.0, 50 mM NaCl and concentrated to a final concentration of Ϸ1 mM.
Electrophoretic Mobility Shift Assay-A double-stranded 27-mer oligonucleotide pair, symmetrically methylated (mC) at a single site on each strand (5Ј-TCAGATTCGCGCmCGGCTGCGATAAGCT-3Ј and its reverse complement; Ref. 1) was end-labeled with digoxygenin-dideoxy-UTP using an oligonucleotide end-labeling kit (Roche Molecular Biochemicals) according to the manufacturer's instructions. An unmethylated version of the same oligonucleotide pair was labeled similarly. Purified proteins were incubated with 2 nM labeled duplex oligonucleotide plus 50 ng/l poly[d(A-T)] (Roche Molecular Biochemicals) as a nonspecific competitor in 20 mM HEPES, pH 7.6, 1 mM EDTA, 10 mM (NH 4 ) 2 SO 4 , 1 mM dithiothreitol, 0.2% Tween® 20, 30 mM KCl in a final volume of 30 l at room temperature for 15 min. Loading buffer (60% 0.25ϫ Tris-borate EDTA (TBE), 40% glycerol; 7.5 l) was added, and the samples were electrophoresed on pre-run 0.25ϫ TBE, 6% polyacrylamide (37.5:1 acrylamide-bisacrylamide) gels at 250 V for 3.5 h in 0.25ϫ TBE buffer at 4°C. Gels were blotted onto Hybond-N ϩ (Amersham Pharmacia Biotech) in 0.25ϫ TBE buffer overnight, and the membranes were fixed by UV cross-linking followed by baking at 80°C for 10 min. Digoxygenin-labeled probes were detected using a nonradioactive DNA detection kit (Roche Molecular Biochemicals) according to the manufacturer's instructions, followed by exposure to Hyperfilm ECL (Amersham Pharmacia Biotech). Scanned images were quantified using ImageQuant software (Molecular Dynamics).
NMR Spectroscopy-Spectra were collected on a 1 mM sample in 90% H 2 O, 10% D 2 O at 18°C in a 50 mM sodium phosphate buffer adjusted to pH 6.0. Two-dimensional 15 N-HSQC and three-dimensional 15 N-edited nuclear Overhauser effect spectroscopy (NOESY) spectra were recorded on a Varian INOVA 600 MHz spectrometers using 5-mm probes. All NMR data were processed with the AZARA package (see the CCPN website at the University of Cambridge) with the application of an 80°s inebell-squared window function. ANSIG (22,23) was used to display spectra and maintain cross-peak lists for resonance assignment. The resonance assignment for wild-type MBD of MeCP2 was described previously (19). By overlaying two-dimensional 15 N-HSQC spectra collected on mutant MeCP2 MBD proteins, it was possible to assign the majority of peaks. Backbone 1 H N and 15 N atoms of the G114P mutant were sequence-specifically assigned from three-dimensional 15 N-edited nuclear Overhauser effect spectroscopy (NOESY) experiments (75-ms mixing time) on the mutant. The chemical shift perturbations for assigned peaks were quantitatively measured using weighted averaging ␦Av ϭ ([␦H N 2 ϩ (␦N/5) 2 ] ½ ), where ␦Av is the weighted average shift difference, and ␦H N and ␦N are the differences in ppm between wildtype and mutant chemical shifts. The ratio of 15 N heteronuclear nuclear Overhauser effects obtained from experiments with a 3.01-ms saturation pulse and without a saturation pulse was used to probe the backbone dynamics of the G114P MBD mutant (24).
a Amino acids mutated in the initial screen are indicated by their single letter codes with numbering for full-length MeCP2. Substituted amino acids are indicated in parentheses, and those residues mutated in cases of Rett syndrome are indicated by asterisks.
b The location of the residues within the NMR-derived structure of the MBD is indicated; side chains that contribute to the hydrophobic core are denoted by "core." c Conservation of the amino acid within the MBD family is indicated. d The chemical shift change of the corresponding backbone amide in the MeCP2 MBD upon interaction with DNA (19). e DNA binding is quantified as the percentage of bound DNA at a protein concentration of 30 nM as determined by EMSA.

RESULTS
EMSA Analysis of DNA Binding by MeCP2 MBD Mutants-We generated a set of point mutations within the MBD that included solvent-exposed charged and hydrophobic residues, core residues, and side chains predicted to be important for the conformation of the peptide backbone within the domain (Table I). These mutant proteins were assayed for binding to a methylated 27-mer duplex oligonucleotide containing a single, symmetrically methylated CpG dinucleotide (see "Experimental Procedures") at fixed protein and DNA concentrations of 30 and 2 nM, respectively. At this protein concentration, the wildtype MBD is Ͼ80% bound. As quantified in Table I, the mutant proteins exhibited a range of binding activities from severely compromised to wild-type affinity; none of the mutants nor the wild type showed any affinity for the unmethylated oligonucleotide under these conditions (data not shown). Of the mutations causing a significant reduction in DNA binding, the G114P and D121A or D121E mutants were of particular interest. Gly-114 is located at the tip of the disordered B-C loop, which otherwise seems fairly amenable to substitution (see K112M and S113A, Table I). Residue Asp-121, which we had previously predicted to be involved in bridging interactions with arginine side chains on the DNA binding face (19), was shown to be critical for binding, as its mutation to alanine or even glutamate severely affected the DNA interaction. The mutation S134A was also seen to cause a moderate reduction in binding affinity. Ser-134 was found to be mutated to cysteine in cases of Rett syndrome (14,16), and the reduction in binding affinity caused by mutating this residue may thus contribute to the pathology.
Constraining the B-C Loop by a G114P Mutation Severely Compromises Binding-As noted above, a mutation of residue Gly-114 within the disordered B-C loop region to proline had a major effect on DNA binding (Table I). This is in contrast to the mild effects of other mutations within this loop examined, including the adjacent S113A. The replacement of the flexible residue glycine with a structurally constrained proline is predicted to preclude certain conformations of the B-C loop when the MBD domain binds to DNA. To quantify the magnitude of the effect of this change on binding, we carried out a quantitative EMSA analysis with the G114P mutant (Fig. 1A). Minimal binding (10%) was detectable at the highest protein concentration used (128 nM), indicating that binding was Ϸ25-fold weaker than that of the wild-type MBD, which showed 10% binding at Ϸ5 nM protein under these conditions. Thus it appears that flexibility in the B-C loop region is a critical compo- nent of DNA recognition by the MBD, possibly reflecting the ability of this region to fit into the major groove of DNA and make multiple contacts with the phosphate backbone (19). Two-dimensional 15 N-HSQC NMR analysis of the structure of the G114P mutant indicated major changes in the loop region from residues 110 -121, consistent with the structural constraint of this loop as a result of the mutation (Fig. 1B). In contrast, the structure of the rest of the MBD remained unaffected, indicating that the overall fold remained intact as predicted. However, analysis of the backbone dynamics of the loop region in the G114P mutant revealed that it is still relatively disordered (Fig. 1C), suggesting that the proline substitution precludes specific conformations rather than imposing a general reduction in flexibility. It is also possible that the proline substitution introduces a steric clash with the DNA in the mutant. Either possibility is, however, inconsistent with "docking models" that show a lack of close contact between the B-C loop and the target DNA (20,25). Instead, we suggest that an induced conformational change in this region of the protein upon DNA interaction contributes to the binding affinity.
Structural Analysis of the D121A Mutant-Because the mutagenesis screen suggested that residue Asp-121 was particularly critical for DNA binding and it had previously been suggested that this side chain might form an important bridging interaction with the Arg-111 or Arg-133 side chains (19), the D121A mutant was compared with the wild-type domain by two-dimensional 15 N-HSQC NMR spectroscopy. An analysis of the backbone amide chemical shift differences between the wild-type and mutant proteins for residues throughout the domain ( Fig. 2A) showed that this mutation causes significant changes in the environment of the B-C loop region and toward the C terminus of the domain, which folds on top of strand C, of which Asp-121 is a part. Moreover, the Arg-111 backbone amide could not be assigned in the mutant by two-dimensional 15 N-HSQC, whereas the Arg-133 chemical shift was not greatly affected. Likewise, the Arg-111 side chain H ⑀ atoms were significantly perturbed by the D121A mutation and moved from an abnormal chemical shift to a position in the spectrum occupied by the majority of arginine side chain H ⑀ atoms (Fig. 2B). This suggests that the primary interaction of Asp-121 is with Arg-111 and that the effect of the D121A and D121E mutations on DNA binding may largely be due to the abolition of this interaction given the critical role of Arg-111 in DNA binding (see below). It seems probable that the Asp-121-Arg-111 interaction positions the basic arginine side chain so that it can interact correctly with the DNA target.
Crucial Role of the Arg-111 Side Chain in DNA Binding-We mutated the Arg-111 residue of the MeCP2 MBD to glycine (Table II) and examined the effect of the mutation on DNA binding. The mutant protein exhibited no affinity for methylated DNA at the concentrations tested (Fig. 3A), although the mutated domain remained structured with only local perturbations in the vicinity of Arg-111 and the B-C loop (Fig. 3B). We also observed that the mutant protein, unlike all other MBD derivatives tested, bound very poorly to Fractogel SO 3 2Ϫ resin during purification (data not shown). These are the most severe phenotypes seen for any single MBD missense mutation and indicate that the positive charge provided by the Arg-111 side chain is crucial for the interaction of the domain with both DNA and the negatively charged resin. It also suggests that the reduced binding affinities seen for the D121A and G114P mutants (above) may in part be due to their disruptive effects on the environment of this arginine side chain. A second arginine side chain, that of Arg-133, also projects from the DNA binding face of the domain, is conserved in the MBD family, and undergoes a large chemical shift change upon DNA binding (19). This latter residue is found mutated (to cysteine) in a case of Rett syndrome (10). We made a mutant MBD containing the R133C mutation (Table II) and compared its effect on binding to that of R111G. Unlike the Arg-111 mutant, R133C showed weak binding to methylated DNA at a concentration of 2 M (Fig. 3A) and had minimal effects on the overall structure of the domain (Fig. 3C). Also unlike the R111G mutation, the R133C mutation did not affect the chemical shift of the Asp-121 backbone amide, confirming that Asp-121 interacts with Arg-111 rather than Arg-133. We also observed that mutation of Arg-133 to glycine had a less significant effect on binding than the FIG. 2. Structural effects of the D121A mutation. A, chemical shift perturbation for the backbone amides of the D121A mutant compared with wild type as measured by two-dimensional 15 N-HSQC NMR spectroscopy. Significant perturbations are those greater than 0.04 ppm, as indicated by the horizontal dotted line. Amides that could not be unambiguously assigned in the spectrum of the mutant protein are indicated by asterisks, and the D121A mutation is indicated by the vertical dotted line. A schematic of the regular secondary structural features of the MBD is shown above the graph (arrow, ␤-strand; coil, ␣-helix). B, a portion of the two-dimensional 15 N-HSQC NMR spectrum for the D121A mutant overlaid with that of the wild-type MBD. Backbone amides are indicated for the wild-type (blue) and mutant (red), and arginine H ⑀ atoms are indicated for the wild-type (magenta) and mutant (green). The large chemical shift change for the Arg-111 H ⑀ atoms caused by the D121A mutation is indicated. cysteine substitution at this position ( Table I), suggesting that the positive charge of this arginine side chain does not contribute substantially to the DNA interaction. Thus, although the reduced binding affinity of the R133C mutant probably accounts for its phenotype as manifested in Rett syndrome, this side chain is not of comparable importance to Arg-111 in binding DNA.
Effects of Mutations in the "Hydrophobic Patches" Tyr-123 and Ile-125-It is interesting to note that the only significant change in a backbone amide chemical shift remote from the mutated residue in the R133C mutant was manifested in the strand C residue Tyr-123 (indicated by an asterisk in Fig. 3C). In the HSQC spectrum of R133C, the cross-peak due to the Tyr-123 amide was not detectable at or near its position in the spectrum of the wild type. Therefore it had either shifted significantly or become broadened due to a substantial change in local relaxation properties. In either case, it is apparent that changing Arg-133 to Cys significantly perturbed residue 123. This (with Ile-125) is one of two semi-conserved hydrophobic side chains within the putative DNA binding face of MBD proteins that we proposed earlier to form an interaction site for the methyl groups of a methylated CpG dinucleotide (19). Mutation of Tyr-34, the equivalent of Tyr-123 in MBD1, to alanine reduces methyl-DNA binding (20), although Ile-125 is replaced by a glutamine in the latter protein. The chemical shift of Tyr-123 may be affected by the R133C mutation due to the spatial proximity of the two side chains. In MeCP2 only, the Ala-131 side chain also contributes to the second hydrophobic patch (19), and we observed that mutation of this side chain to glutamate reduced the DNA binding affinity (Table I). We therefore mutated Tyr-123 and Ile-125 of MeCP2 to alanine (Table II) and analyzed the binding of the mutant proteins to methylated DNA (Fig. 3A). Both mutant proteins exhibited reduced binding affinity, although in each case this was only ϳ10-fold lower than the wild-type affinity, and the majority of the probe was bound at the higher concentrations used (200 nM and 2 M). Neither mutant had as severe an effect on binding as either of the arginine mutants R111G or R133C. Mutating Tyr-123 to a negatively charged residue, aspartate, instead of the hydrophobic alanine caused a more severe effect on binding, but interaction with methylated DNA was still observed (Table II). The Y123A mutation had no significant structural effects on the domain according to two-dimensional 15 N-HSQC NMR (data not shown), consistent with its solvent-exposed location and relatively high binding affinity. Thus, it seems reasonable to conclude that, although the Tyr-123, Ile-125, and Ala-131 side chains contribute to the interaction with methylated DNA, there is redundancy in this component of the recognition event. The role of basic side chains such as Arg-111 and, to a lesser extent, Arg-133, together with the flexibility of the B-C loop is more important in contributing binding energy to the reaction. How exactly the methylated cytosines interact with the residues in the DNA binding face and determine the specificity for methylated DNA is a question to be addressed by the structure of the MBD⅐DNA complex.
Missense Mutations Found in Rett Syndrome Cases Affect the Structure of the MBD-As noted above, the R133C mutation, which has a significant effect on DNA binding affinity, is found in cases of Rett syndrome. We examined three other missense mutations within the MBD identified in the initial study of MeCP2 mutations in Rett syndrome (Ref. 10; Table II) to see if they also affected DNA binding. Amino acids Arg-106 and Phe-155 have side chains that both contribute to the hydrophobic core of the MBD domain (19) and are mutated to tryptophan and serine, respectively, in specific Rett syndrome cases. Further occurrences of the R106W mutation have since been identified (13,16,18), and R106Q and F155I mutations have also recently been found in Rett syndrome patients (14,18). The R106W protein was found to bind methylated DNA very poorly, whereas the F155S mutant, although forming a complex at fairly low protein concentrations, did not shift the probe to completion (Fig. 4). The latter result suggested that a proportion of the F155S protein added to the reaction may have been unfolded, and indeed two-dimensional 15 N-HSQC NMR spectroscopy of both of these mutant proteins showed the domain structure to be unfolded, preventing re-assignment of the backbone amides. It is likely that the F155S mutant in particular may be stabilized in the EMSA assay by the presence of its target DNA and of glycerol, by the low temperature at which the electrophoresis is carried out and by the caging effect of the polyacrylamide gel. Indeed, a recent study of the F155S mutation in the context of the intact Xenopus MeCP2 protein in which mobility shift reactions were incubated at 37°C failed to detect any binding activity of the mutant protein, presumably because the MBD became unfolded (25). The presumptive instability of the R106W and F155S proteins in vivo at 37°C is likely to make them nonfunctional in DNA binding, accounting for the observed phenotype. In contrast, a third mutation, T158M, which has been found repeatedly in Rett syndrome cases (10,13,14,16,18), had near-wild-type affinity for the methylated oligonucleotide ( Fig. 4; Table II). This has also been shown in the context of Xenopus MeCP2 (25) and is perhaps unsurprising as the Thr-158 side chain is on the opposite side of the domain from the DNA binding face and has no obvious a Amino acids mutated are indicated by their single letter codes with numbering for full-length MeCP2. Substituted amino acids are indicated in parentheses, and those residues mutated in cases of Rett syndrome are indicated by asterisks.
b The location of the residues within the NMR-derived structure of the MBD is indicated; side chains that contribute to the hydrophobic core are denoted by "core." c Conservation of the amino acid within the MBD family is indicated. d The chemical shift change of the corresponding backbone amide in the MeCP2 MBD upon interaction with DNA (19). e DNA binding is quantified as the percentage of bound DNA at the protein concentrations indicated, as determined by EMSA. role in either structure or function. Instead, Thr-158 may be involved in interactions between the MBD and other domains of MeCP2 or other proteins, which could affect the function of the intact protein. DISCUSSION In this study we have investigated those amino acid side chains within the previously suggested DNA binding face of the MeCP2 MBD that could make an important contribution to the recognition of methylated DNA. It is apparent from our EMSA assays that the arginine residue 111, which is absolutely conserved in the MBD family, plays a critical role in DNA binding, as its mutation results in an MBD which, although still structured, has no detectable affinity for DNA. Arg-111 appears to be involved in an interaction with the conserved aspartate side chain Asp-121, which is also very sensitive to mutation, and as was previously speculated (19), this interaction may orientate the arginine functional group so that it is in the correct position to bind DNA. Given the severe effect of mutating Arg-111, it is likely that its DNA interaction is a specific one involving a guanine in the methylated CpG recognition sequence rather than a nonspecific one with the phosphate backbone. Recognition of guanine bases via a "buttressed arginine" of this type is an important component of DNA binding by the zinc fingers of the Zif268 protein (26). In contrast, mutation of Arg-133 to cysteine, found to occur in cases of Rett syndrome, produced an MBD with low but detectable affinity for methylated DNA, whereas a glycine substitution at position 133 had only a mild effect on DNA binding.
We have also defined an important role for flexibility in the disordered B-C loop region in DNA binding. Although several mutations within the loop region do not affect binding, a G114P mutation, which is predicted to restrict the conformational flexibility in this region of the protein, reduces its affinity for DNA substantially. The mutation is seen by NMR spectroscopy to affect the environment of the loop as expected but to leave the overall structure of the MBD unaltered. This is consistent with the idea that the B-C loop may move to fit into the major groove of the DNA as the protein binds. One residue that may be brought into position by this movement is the critical side chain of Arg-111, although the restriction of movement in the Amides that could not be unambiguously assigned in the spectrum of the mutant protein are indicated by asterisks, and the R111G mutation is indicated by the vertical dotted line. A schematic of the regular secondary structural features of the MBD is shown above the graph (arrow, ␤-strand; coil, ␣-helix). C, chemical shift perturbation for the backbone amides of the R133C mutant compared with wild type as measured by two-dimensional 15 N-HSQC NMR spectroscopy. Significant perturbations are those greater than 0.04 ppm, as indicated by the horizontal dotted line. Amides that could not be unambiguously assigned in the spectrum of the mutant protein are indicated by asterisks, and the R133C mutation is indicated by the vertical dotted line. A schematic of the regular secondary structural features of the MBD is shown above the graph (arrow, ␤-strand; coil, ␣-helix).
loop does not have such a severe effect as the removal of the arginine, since the G114P mutant still has affinity for DNA at protein concentrations of greater than 100 nM. It is more likely that a general nonspecific interaction is made with DNA by positive charges in the loop region such as Arg-115 and Lys-119, whereas Arg-111 has a more specific role to play.
The hydrophobic residues exposed on the DNA binding face, Tyr-123, Ile-125, and Ala-131 in MeCP2, have been shown to contribute to the affinity of binding to methylated DNA. However, in contrast to predictions from the original MBD-DNA interaction model (19,20), their individual contributions appear to be fairly weak, and specificity for methylated DNA is retained when they are removed. More severe effects on binding can be obtained when negative charges, which will repel the phosphate backbone, are introduced at these positions (the Y123D and A131E mutations), although binding is still not completely abolished. This suggests that specificity for the exposed methyl groups in the major groove of the DNA may be a result of additive hydrophobic interactions within the protein-DNA interface. Given this and the observation that a charged residue, Arg-111, seems to be more critical for DNA binding, it is also possible that the aliphatic portion of the arginine side chain may contribute directly to recognition of the methylated base.
Finally, we have considered the effects that Rett syndromeassociated missense mutations within the MBD have on its DNA binding ability. In general, these were found to lead to a reduction in binding affinity, but in only one case was this a direct effect of mutating a residue implicated in protein-DNA interactions. This mutation, R133C, is unique among the Rett syndrome mutations tested (Figs. 7 and 8) in affecting binding without any structural effect on the domain. A study of the R133C mutation in the context of Xenopus MeCP2 has suggested that it may cause an altered protein structure (25), but the circular dichroism analysis used in that study does not permit a precise determination of structural changes. In contrast, our NMR analysis of the R133C mutant MBD shows very limited changes in the chemical shifts of backbone amides throughout the domain, indicating that it is structurally virtually identical to the wild-type protein. Unlike R133C, the R106W and F155S mutations both affect the structure and stability of the domain, causing it to unfold under NMR conditions, as expected for side chains that contribute to the hydrophobic core. However, F155S in particular retained DNA binding activity in the EMSA assay, indicating that the domain is not completely unfolded under these conditions. The S134A mutation, analogous to S134C found in a Rett syndrome case, also caused a reduction in DNA binding affinity, but a fifth mutation, T158M, which has been found in several Rett patients, did not seem to affect DNA binding affinity significantly. The most likely explanation for the Rett syndrome phenotype of this mutation is that the Thr-158 side chain, which is on the opposite side of the MBD domain from the DNA binding surface, is important for interactions between this domain and the rest of the intact MeCP2 protein.
Since this study commenced, several other Rett syndromeassociated mutations have been found within the MBD. These include changes at two proline residues, 101 (to Thr, His, or Leu) and 152 (to Arg) (13,15,16), which probably have structural effects on the domain given their locations between the A and B strands and at the terminus of the ␣-helix, respectively. Subtle aspartate to glutamate mutations at positions 97 (within strand A) and 156 (at the C terminus of the domain) have also been found in Rett patients (13,18); although the latter residue is well conserved in the MBD family, neither mutation is predicted to have a particularly severe effect on the properties of the domain itself. In contrast to previous statements (13), Asp-156 is not implicated in any way in DNA binding. Instead, the interfaces between the MBD and other domains of the protein (or other proteins) could be affected by the D97E and D156E mutations. It seems that in general, the missense mutations found in Rett syndrome may result in structural changes in the domain rather than specifically targeting side chains involved in DNA recognition, the exception being the R133C mutation. However, it must be remembered that the mutation frequency at different sites within the DNA encoding the MBD probably varies widely, with many of the mutations identified so far being at highly mutable CpG sites. It would be interesting to determine the phenotypes caused by some of the DNA-binding site mutations constructed in this study, such as the loop mutation G114P and the severe R111G change, in an in vivo model system.