Identification and characterization of the DNA binding domain of CpG-binding protein.

CpG-binding protein is a transcriptional activator that exhibits a unique DNA binding specificity for unmethylated CpG motifs. CpG-binding protein contains a cysteine-rich CXXC domain that is conserved in DNA methyltransferase 1, methyl binding domain protein 1, and human trithorax. In vitro DNA binding assays reveal that CpG-binding protein contains a single DNA binding domain comprised of the CXXC domain and a short carboxyl extension. Specific mutation to alanine of individual conserved cysteine residues within the CXXC domain abolishes DNA binding activity. Denaturation/renaturation experiments in the presence of various metal cations demonstrate that the CXXC domain requires zinc for efficient DNA binding activity. Ligand selection of high affinity binding sites from a pool of degenerate oligonucleotides reveals that CpG-binding protein interacts with a variety of sequences that contains the CpG dinucleotide with a consensus binding site of (A/C)CpG(A/C). Mutation of the CpG motif(s) present within ligand-selected oligonucleotides ablates the interaction with CpG-binding protein, and mutation to thymine of the nucleotides flanking the CpG motifs reduces the affinity of CpG-binding protein. Hence, a CpG motif is necessary and sufficient to comprise a binding site for CpG-binding protein, although the immediate flanking sequence affects binding affinity.

The CpG dinucleotide represents an important regulatory component of mammalian genomes. The cytosine of this dinucleotide serves as the target for methylation via the action of DNA methyltransferases. Methylated DNA is correlated with transcriptionally inactive genes, whereas actively expressed genes are generally hypomethylated (1). It has also been suggested that cytosine methylation represents a defense mechanism to silence parasitic repetitive DNA elements present in mammalian genomes (2). Methylation patterns inherited from gametes are generally erased during early embryogenesis (morula) followed by a wave of de novo DNA methylation in the blastocyst upon implantation (3). The CpG dinucleotide is un-derrepresented in mammalian genomes (5-10% of expected frequency), presumably because of the propensity of 5-methylcytosine to undergo spontaneous deamination to form thymine. Approximately 50% of human genes are associated with CpG islands (1), which contain the statistically expected frequency of CpG dinucleotides. This may reflect the fact that CpG motifs near widely expressed genes are generally hypomethylated.
Cytosine methylation also plays an important role in the process of genomic imprinting, in which paternal and maternal alleles of a gene exhibit distinct patterns of methylation and expression (4), and X-chromosome inactivation, in which one X-chromosome in each cell of a female becomes transcriptionally inactivated during early development (5). Appropriate cytosine methylation is essential for normal mammalian development. Individual ablation of the DNA methyltransferase genes DNMT1, 1 DNMT3a, or DNMT3b leads to a disruption of murine embryonic development (6,7). Furthermore, mutations in Dnmt3b that are predicted to partially inhibit function are associated with the ICF (immunodeficiency, centromere instability, and facial anomalies) syndrome in humans (8). Also, mutations in the methyl-CpG-binding protein MeCP2 lead to Rett's syndrome, a progressive neurodegenerative disorder (9). Finally, hypermethylation of tumor suppressor genes is commonly observed in human cancer (10).
A number of DNA binding factors have been reported that bind to methylated CpG motifs and function as transcriptional repressors (11). These include MeCP2, methyl binding domain (MBD) protein 1, MBD2, and MBD4. Each of these factors contains a conserved methyl-CpG binding domain (MBD) but otherwise exhibits little sequence similarly. A fifth protein, MBD3, also contains the MBD domain but has not been shown to bind to methylated DNA. MBD2 and MBD3 are components of the histone deacetylase complexes MeCP1 and Mi-2, respectively (12,13), thus linking cytosine methylation with histone acetylation and providing a unifying framework for the control of chromatin structure and gene regulation.
A major unanswered question is how specific CpG motifs become methylated during development and how CpG islands remain hypomethylated despite an open chromatin configuration and free access to DNA methyltransferases. Others have noted that the ubiquitous transcription factor Sp1 binds to sites within some CpG islands (its consensus recognition site is GGGGCGGGG) and protects the CpG island of the adenine phosphoribosyltransferase gene from cytosine methylation (14,15). This has led to a model in which DNA binding proteins such as Sp1 may protect CpG islands from methylation. How-ever, Sp1 is not required for maintaining hypomethylation of CpG islands because CpG islands remain hypomethylated after disruption of the Sp1 gene in mice (16).
We have previously described a novel transcriptional activating DNA binding protein termed CpG-binding protein (CGBP) that specifically binds to sequences that contain unmethylated CpG dinucleotides (17). Hence, CGBP may play a role both in the expression of genes associated with CpG islands as well as in the regulation of cytosine methylation. Inspection of the CGBP sequence reveals the presence of several domains, including two copies of the plant homeodomain (PHD), a coiled-coil domain, basic and acidic stretches, and a highly conserved cysteine-rich CXXC domain that is also found in human trithorax (HRX, also known as MLL or ALL-1), MLL-2, DNMT1, and MBD1 (17)(18)(19). In these studies we determine that the CXXC domain comprises the sole DNA binding domain of CGBP and identify the consensus DNA binding sequence for CGBP.

EXPERIMENTAL PROCEDURES
Materials-Oligonucleotides were purchased from Life Technologies Inc. [␥-32 P]ATP was obtained from PerkinElmer Life Sciences, and deoxyribonucleotides and poly(dA-dT) were obtained from Amersham Pharmacia Biotech. The ECL Western blotting detection system was purchased from Amersham Life Science. Nitrocellulose membrane was obtained from Micron Separations Inc. (Westborough, MA). The Bradford protein assay reagent was obtained from Bio-Rad. The pQE9 6ϫHis-tag expression vector was obtained from Qiagen Inc. (Valencia, CA). All other reagents were obtained from Sigma or Fisher.
Construction of Plasmids and Site-directed Mutagenesis-Various regions of the human CGBP cDNA were amplified by PCR and subcloned into the pQE9 6ϫHis-tag expression vector to identify the DNA binding domain(s) of CGBP. The following primers were used to amplify the indicated regions of CGBP cDNA by PCR. Each primer name starts with 5Ј or 3Ј to indicate forward or reverse primers, respectively. The letters in each primer designation indicate the CGBP amino acids encoded by the end of the amplified fragment. Recovered DNA fragments were subcloned into SalI-and HindIII-digested or BamHI-and HindIII-digested pQE9 6ϫHis-tag expression vector. Amino acids 1-153: 5Ј-MEG (CATCACGTCGACATGGAGGGAGATGGTTCAGAC) and 3Ј-QSP (CTAATTAAGCTTCTGGCTGGGTGTGGCCACCAAG). Site-directed mutagenesis was performed on the CXXC domain of CGBP using primers that mutate cysteine residues to alanine. Each mutation was introduced into a fragment containing amino acids 142-250 of CGBP using the QuikChange site-directed mutagenesis kit (Stratagene, CA) in accordance with the protocol provided by the manufacturer. Mutagenesis oligonucleotides include CXXCA (mutates cysteine 169 to alanine), CGGTCAGCCCGCATGGCTGGTGAGTGT-GAGGCATG, and CXXCB (mutates cysteine 208 to alanine), GCCGGCTGCGCCAGGCCCAGCTGCGGGCCC. The nucleotide sequences of truncated and mutated CGBP constructs were confirmed by automated DNA sequencing.
Production of 6ϫHis-tagged Proteins and Western Blot Analysis-Escherichia coli cells were grown at 37°C to an A 600 of 1.0. Isopropyl-␤-D-thiogalactoside was added to 0.2 mM, and cells were grown for an additional 4 -6 h. Cells were harvested and resuspended in 50 l of ice-cold phosphate-buffered saline/ml of culture. Cells were sonicated for 10 min on ice, Tween 20 was added to a final concentration of 0.6% to facilitate solubilization of protein, and samples were incubated on ice for 30 min. Lysate was then centrifuged at 15,000 rpm for 30 min. Supernatant was loaded onto a His-Trap purification column (Amersham Pharmacia Biotech). The protein was washed with excess phosphate-buffered saline and eluted with 500 mM imidazole. The samples were dialyzed with phosphate-buffered saline containing 10% glycerol, and the dialysate was stored at Ϫ80°C. For Western blot analysis, 0.5 g of partially purified protein was solubilized in Laemmli sample buffer (20). After electrophoresis on a 10 -12% SDS-polyacrylamide gel, proteins were transferred onto nitrocellulose membrane (MSI, Westborough, MA). The membrane was then incubated with anti-6ϫHis-tag monoclonal antibody (R&D Systems, Minneapolis, MN) followed by horseradish peroxidase-labeled anti-mouse antiserum and detected by using an ECL detection kit (Amersham Pharmacia Biotech) according to the manufacturer's instructions.
EMSA-EMSA was performed as described previously (21) with slight modifications. Briefly, 0.01-0.1 g of partially purified CGBP protein was incubated with 0.5 g of herring sperm (Amresco) or 1 g of poly(dA-dT) on ice for 15 min in a 40-l reaction volume before the addition of competitor double-stranded oligonucleotides. After an additional 30-min incubation on ice, 40,000 cpm of end-labeled doublestranded oligonucleotide probe was added to the reaction and incubated for 30 min at room temperature. EMSA samples were loaded onto a 0.5ϫ TBE (45 mM Tris (pH 8.3), 45 mM borate, 1.25 mM EDTA), 6 -9% non-denaturing polyacrylamide gel, and electrophoresis was performed at 200 V at room temperature for 2 h. Oligonucleotide probes were radiolabeled by T4 polynucleotide kinase using [␥-32 P]ATP followed by annealing with equal molar complementary strand oligonucleotide. Radiolabeled probes were resolved by electrophoresis on a 10% polyacrylamide gel and eluted by the crush and soak method (22). Oligonucleotide methylation was accomplished by incubating DNA with SssI methylase and S-adenosylmethionine as recommended by the manufacturer (New England Biolabs, Beverly, MA), and as previously described (17).
Denaturation and Renaturation of the CXXC Domain-Denaturation and renaturation of the minimal DNA binding domain of CGBP was performed as described (23). Briefly, 50 g of partially purified protein (amino acids 162-221) was diluted to 2.5 ml in denaturation buffer (6 M guanidine hydrochloride, 20 mM HEPES, 100 M KCl, 10% glycerol, 3 mM dithiothreitol, 100 g/ml bovine serum albumin, 3 mM EDTA, and 3 mM EGTA) and dialyzed as 400-l aliquots against two changes of the same buffer (without bovine serum albumin) for 1 h at 4°C. Samples were then dialyzed against denaturation buffer (without EDTA and EGTA) followed by four successive dilutions (1:1) of the dialysate with renaturation buffer (same as denaturation buffer but lacking guanidine hydrochloride, EDTA, and EGTA) supplemented with either 10 M CaCl 2 , CdCl 2 , MgCl 2 , NiCl 2 , or ZnCl 2 . Samples were recovered and stored at 4°C until analyzed by EMSA as described above.
Ligand Selection of the CGBP Consensus Binding Site-A 6ϫHistagged CGBP fragment (amino acids 106 -345) that contains the CXXC domain, adjacent acidic domain, and a portion of a basic domain (17) was partially purified using His-Trap purification columns as described above and used to select binding sequences from a pool of degenerate oligonucleotides. Repeated EMSA and PCR amplification of bound oligonucleotides were performed as described (24). The degenerate oligonucleotides consisted of 57-mers containing a stretch of 20 degenerate nucleotides (N) flanked by anchor sequences: TCTGGGATCCTTCTG-GTG-NNNNNNNNNNNNNNNNNNNN-GAGACACTGGGAATTC-CAG. The EcoRI and BamHI sites designed to facilitate subcloning of the selected fragments are underlined. The oligonucleotides were made double-stranded by hybridization with end-labeled primer complementary to the downstream anchor sequence and elongation for 30 min at 30°C with deoxyribonucleotide triphosphates and Klenow DNA polym-erase. These double-stranded oligonucleotides (0.3 g) were incubated with 5 g of 6ϫHis-tagged CGBP partially purified protein in the presence of 2.5 g of herring sperm DNA in 60 l of EMSA reaction buffer. Protein-DNA complexes were resolved on a 6% polyacrylamide gel, and bound oligonucleotides were excised and eluted as above, extracted with phenol/chloroform, and ethanol-precipitated and resuspended in 50 l of water. PCR reactions were performed in a 100-l volume using 5 l of the DNA recovered from the retarded protein-DNA complex and 20 pmol of primers complementary to the flanking anchor sequences. PCR reactions were incubated for 1 min at 94°C, 1 min at 50°C, and 1 min at 72°C for 30 cycles in a DNA thermal cycler. The amplified DNA was subjected to eight additional cycles of binding selection and PCR amplification. After the first round of selection the 6ϫHis-tagged CGBP fusion protein was reduced to 2.5 g. The competitor DNA was changed to 1.0 g of poly(dA-dT) after the 5th round of selection. The recovered fragments were digested with EcoRI and BamHI, purified on an 8% polyacrylamide gel, ligated into EcoRI-and BamHI-digested pBluescript KS (Stratagene), and transformed into E. coli DH5␣ cells. Individual clones were isolated, and the nucleotide sequence was determined by the dideoxy method (Sequenase 2.0, U. S. Biochemical Corp.).

RESULTS
Identification of the CGBP DNA Binding Domain-Experiments were performed to identify the domain(s) of CGBP responsible for the unique DNA binding specificity for sequences containing unmethylated CpG dinucleotides. Overlapping fragments of human CGBP were expressed as 6ϫHis-tagged proteins and tested for DNA binding activity in EMSA using the previously described CpG-pos binding site of CGBP as a probe (17). The CpG-pos oligonucleotide corresponds to an element derived from the human gp91 phox promoter (21). This sequence has been mutated to prevent binding of CCAAT box binding factors (17). Mutation of the CCAAT box to CCGGT introduced a CpG motif into the sequence. The CGBP fragment containing amino acids 106 -345 includes the CXXC domain, an adjacent acidic domain, and a portion of a basic domain (Fig. 1A). CGBP amino acids 106 -285 include the CXXC domain and a portion of the acidic domain, amino acids 1-153 include the PHD1 domain, and amino acids 231-656 contain the acidic, basic, coiled-coil, and PHD2 domains.
CGBP fragments containing either amino acids 106 -345 or 106 -285 exhibit DNA binding activity (Fig. 1B, lanes 2 and 3), whereas protein fragments containing amino acids 1-153 or 231-656 fail to bind the DNA probe (Fig. 1B, lanes 4 and 5). Each of the 6ϫHis-tagged CGBP protein fragments was successfully expressed and recovered from E. coli, as demonstrated by Western blot analysis of each protein sample (Fig. 1C). Multiple bands in some samples presumably reflect partial proteolysis. Hence, the DNA binding domain of CGBP resides within amino acids 106 -285. Furthermore, the absence of DNA binding activity by overlapping CGBP fragments suggests that CGBP contains a single DNA binding domain that resides within amino acids 153-231, a fragment that contains the CXXC domain.
Additional studies were conducted on smaller fragments of CGBP to more precisely define the DNA binding domain ( Fig.  2A). As predicted from the results presented in Fig. 1, the 142-250-amino acid region of CGBP exhibits DNA binding activity (Fig. 2B, lane 2). The 162-250-amino acid fragment of CGBP, which is truncated to within 2 residues of the amino end of the CXXC domain, continues to exhibit DNA binding activity (Fig. 2B, lane 3). Successive truncations of the carboxyl terminus reveals that a fragment as small as amino acids 162-221 exhibits DNA binding activity (Fig. 2B, lanes 4 and 5). This 60-amino acid fragment contains the CXXC domain as well as 15 residues of the carboxyl flanking region. Further truncation of 9 residues from the carboxyl tail (leaving amino acids 162-212) ablates DNA binding activity (Fig. 2B, lane 6). Each 6ϫHis-tagged protein was successfully recovered from E. coli, as determined by Western blot analysis presented below the EMSA result (Fig. 2B). Hence, a 60-amino acid fragment of CGBP (amino acids 162-221) that contains the CXXC domain comprises the DNA binding domain. Native CGBP binds specifically to DNA containing unmethylated CpG motifs (17). Additional studies were performed to determine whether the minimal DNA binding domain of CGBP retains this DNA binding specificity. Fig. 2C demonstrates that the EMSA complex produced by the 162-221 amino acid fragment of CGBP is efficiently disrupted by competition with oligonucleotide homologous to the probe (Fig. 2C, lanes 3-5) but not by a similar sequence (CpG-neg) that lacks the CpG motif (Fig. 2C, lanes 6 -8). Importantly, the complex is also not disrupted by the CpG-pos oligonucleotide after methylation in vitro (Fig. 2C, lanes 9 -11). Hence, the 162-221-amino acid fragment of CGBP exhibits a DNA binding specificity indistinguishable from that previously described for CGBP (17).
Site-specific mutagenesis was performed to determine if an intact CXXC domain is required for the DNA binding activity exhibited by the 142-250-amino acid fragment of CGBP. Fig.  3A presents a sequence alignment of eight reported CXXC domains (17)(18)(19). Eight of the 10 perfectly conserved residues are cysteines. We chose to mutate to alanine either the first or last cysteine residue. Each mutated form of CGBP was purified from E. coli as a 6ϫHis-tagged protein, and DNA binding activity to the CpG-pos probe was analyzed by EMSA (Fig. 3B). Mutation to alanine of either cysteine 169 or cysteine 208 completely abolished the DNA binding activity of this fragment of CGBP (Fig. 3B, lanes 2 and 3). Successful recovery of each 6ϫHis-tagged protein was confirmed by Western blot analysis, as shown below the EMSA result. We conclude that an intact CXXC domain is required for the DNA binding activity of CGBP.
The structure of the CXXC domain is distantly related to a zinc finger, although it does not exactly fit the consensus arrangement of cysteines and histidines. Furthermore, the CXXC domain from DNMT1 has been demonstrated to bind zinc (25). We examined whether zinc is required for the observed DNA binding activity of the CXXC domain of CGBP. Purified 6ϫHistagged CGBP fragment containing amino acids 162-221 was denatured in the presence of EDTA and EGTA and then renatured in the presence of various divalent metal cations. Fig. 4 demonstrates that the addition of zinc to the renaturation buffer (lane 7) is required to reconstitute efficient CGBP DNA binding activity to the CpG-pos probe.
Identification of the Consensus DNA Binding Site of CGBP-Previous work demonstrated that CGBP binds to DNA elements containing an unmethylated CpG dinucleotide (17). Experiments were performed to determine the consensus DNA binding site of CGBP. The 6ϫHis-tagged CGBP protein fragment containing amino acids 106 -345 was used in a ligandselection assay to recover high affinity CGBP binding sites from a pool of degenerate oligonucleotides. After nine cycles of binding and PCR amplification, the nucleotide sequences of recovered oligonucleotides were determined (Fig. 5A). Interestingly, all 50 sequences contained a cytosine residue at the 3Ј-most degenerate position, thus creating a CpG motif because the adjacent nucleotide contributed by the flanking PCR anchor sequence is guanine. Alignment of the 50 sequences at the 3Ј-CpG motif reveals that 45 of the 50 sequences contained an adenine or cytosine at the position immediately upstream of the CpG motif. Furthermore, the anchor PCR primer contains an adenine at the position immediately downstream of the CpG dinucleotide. No consensus sequence was otherwise apparent from this alignment, although adenine nucleotides are overrepresented, composing 46% of the total nucleotides present in the degenerate region of the oligonucleotides.
All but one of the recovered sequences (oligonucleotide 38) contain additional CpG motifs elsewhere within the degenerate region of the oligonucleotide. Thirty-five of the recovered oligonucleotides carry a single additional CpG motif within the body of the degenerate sequence. Alignment of these sequences at the internal CpG reveals a similar preference for adenine or cytosine nucleotides at the positions immediately adjacent to the CpG motif and a general preference for adenine-rich DNA elements (Fig. 5B). Hence, the consensus binding site for CGBP appears to be (A/C)CpG(A/C).
Additional experiments were performed to directly assess whether ligand selection resulted in the recovery of high affinity binding sites for CGBP from the pool of degenerate oligonucleotides. Sequence 48 (Fig. 5A) contains two CpG motifs, both of which are present within the context of the consensus flanking sequence CCpGA. This DNA sequence disrupts the CGBP/CpG-pos probe EMSA complex more efficiently than does the homologous CpG-pos competitor, which carries a single CpG motif in the non-consensus context of CCpGG (Fig. 6,  compare lanes 3-5 to lanes 6 -8). As expected, the CpG-neg oligonucleotide competitor fails to disrupt the CGBP complex (Fig. 6, lanes 9 -11). These results indicate that the ligandselection process recovered high affinity CGBP binding sites.
Studies were performed to test the hypothesis that CGBP requires CpG dinucleotides for binding to the ligand-selected oligonucleotides and that adjacent adenine or cytosine nucleo-tides contributes to optimal binding affinity. Three ligandselected sequences were chosen for analysis (Fig. 7A); they are sequence 48, which was described above and contains two CpG motifs; sequence 38, which contains a single CpG motif in the context of flanking cytosine and adenine nucleotides (CCpGA); and sequence 32, which contains two CpG motifs in the context of non-consensus flanking sequences (GCpGA and TCpGA). Mutation of either CpG motif within the 48 sequence (48-Mut1 and 48-Mut2) results in a slight decrease in CGBP binding affinity (Fig. 7B, compare lanes 3-5 to lanes 6 -8 and 9 -11). However, simultaneous mutation of both CpG motifs (48-Mut3) drastically reduces the CGBP binding affinity (Fig. 7B, lanes  12-14). Interestingly, mutation to thymine of the nucleotides flanking each CpG motif (48-Mut4) also decreases CGBP binding affinity (Fig. 7B, compare lanes 3-5 with 15-17). In fact, this alteration has a similar effect on CGBP binding as mutating a single CpG motif (compare 48-Mut4 to 48-Mut1 and 48-Mut2). Mutation of the single CpG motif within the 38 sequence (38-Mut) ablates the CGBP binding affinity, as no disruption of the complex is observed with increasing concentration of competitor (Fig. 7B, compare lanes 18 -20 to lanes  21-23). CGBP exhibits a weaker affinity for the 32 sequence, which contains two CpG motifs in the context of non-consensus flanking sequence (GCpGA and TCpGA). We conclude that the presence of a CpG dinucleotide is a requirement for CGBP binding and that binding affinity is strengthened by the presence of multiple CpG motifs and the presence of adenine or cytosine nucleotides immediately adjacent to the CpG dinucleotide. DISCUSSION We previously utilized a ligand-screening approach to clone CGBP, a novel transcriptional activator that binds specifically to DNA elements containing an unmethylated CpG motif(s) (17). The DNA binding activity of CGBP was demonstrated to reside within amino acids 106 -345, a protein fragment that contains a CXXC domain. This domain is found in several DNA-binding proteins including HRX, MLL-2, DNMT1, and MBD1 (17)(18)(19). The studies described here establish that the CXXC domain is responsible for the unique DNA binding spec-ificity of CGBP and define the preferred DNA binding site of this transcriptional activator.
The presence of the CXXC domain in a number of proteins that bind to CpG motifs is intriguing, although the function of this domain had not been clearly established. The CXXC domain resides within a fragment of DNMT1 that inhibits de novo methylation activity (25) and interacts with HDAC1 (26), and it has been reported to function as a transcriptional repression domain within the HRX protein (27,28). With the exception of CGBP, proteins that contain a CXXC domain additionally contain other distinct DNA binding domains such as the methyl binding domain, AT-hooks, and zinc fingers (19, 29 -32). Furthermore, the CXXC domain is dispensable for the DNA binding activity exhibited by DNMT1 and MBD1 (19,25).
The studies reported here demonstrate that the minimal DNA binding domain of CGBP resides within a 60-amino acid domain (amino acids 162-221) that includes the CXXC domain. An intact CXXC domain is essential for DNA binding activity, as mutation to alanine of either of two conserved cysteine residues within the CXXC domain ablates DNA binding activity. Consistent with a requirement for a defined arrangement of highly conserved cysteine residues, CGBP requires the presence of Zn 2ϩ for DNA binding activity. The minimal CXXC DNA binding domain of CGBP exhibits a DNA binding specificity for elements containing an unmethylated CpG dinucleotide.
Shortly after our description of the cloning of CGBP, Fujino et al. (33) independently report the cloning of this factor (denoted PCCX1 in that report). They reported this factor as a regulator of the interstitial collagenase gene, MMP-1, and determined that a fragment containing amino acids 144 -231, which includes the CXXC domain, exhibits DNA binding activity. Surprisingly, the MMP-1 promoter element used in this FIG. 7. CGBP requires a CpG motif in the context of appropriate flanking sequence for optimal binding affinity. A, nucleotide sequence of oligonucleotides used as competitors. CpG dinucleotides and flanking sequence are underlined. B, EMSA was performed as described under "Experimental Procedures" using 0.05 g of the 6ϫHis-tagged CGBP fragment containing amino acids 142-250 and the CpG-pos probe. Increasing concentrations of the competitor oligonucleotides (1000-, 2000-, or 4000-fold molar excess) were added to the indicated lanes. Lane 1, probe alone; lane 2, no competitor; lanes 3-5, competition with sequence 48, which contains two consensus CpG motifs; lanes 6 -8, competition with sequence 48-Mut1, which contains one consensus CpG motif; lanes 9 -11, competition with sequence 48-Mut2, which contains one consensus CpG motif; lanes 12-14, competition with sequence 48-Mut3, which lacks a CpG motif; lanes 15-17, competition with sequence 48-Mut4, which contains two CpG motifs, each of which is in the context of a non-consensus flanking sequence (TCpGT); lanes 18 -20, competition with sequence 38, which contains one consensus CpG motif; lanes 21-23, competition with sequence 38-Mut, which lacks a CpG motif; lanes 24 -26, competition with sequence 32, which contains two CpG motifs, neither of which resides within a consensus flanking sequence. The arrow indicates the position of the protein-DNA complex. one-hybrid screen lacks a CpG motif. However, linkers added to the end of the promoter fragment introduced CpG dinucleotides, likely explaining the paradoxical binding affinity of CGBP for this promoter element. Consistent with the determined binding specificity of CGBP (17), the binding of CGBP to the MMP-1 promoter element was not disrupted by an unrelated oligonucleotide that lacked a CpG motif but was disrupted by a mutated version of the MMP-1 promoter element that retained the linker CpG dinucleotides (33). Hence, CGBP does not appear to be an authentic regulator of the MMP-1 promoter.
Given our findings that the CXXC domain of CGBP exhibits affinity for unmethylated CpG motifs, it will be of interest to determine in more detail the structural requirements for this unique DNA binding specificity. These results also highlight the importance of additional studies on the contribution of the CXXC domain to the DNA binding activity of factors such as DNMT1, MBD1, HRX, and MLL-2 that contain additional distinct DNA binding domains (17)(18)(19). Interestingly, Fujita et al. (34) recently report that an alternatively spliced form of MBD1 binds either methylated or unmethylated promoters and attribute the ability to bind unmethylated DNA to the presence of a third copy of the CXXC domain. Interestingly, the CXXC domain from CGBP exhibits the highest degree of sequence homology (60% identity) to this alternatively utilized CXXC domain of MBD1.
Inspection of ligand-selected CGBP binding sites reveals a consensus sequence of (A/C)CpG(A/C). As expected, all recovered oligonucleotides contain at least one CpG dinucleotide, and introduction of mutations that remove the CpG motif(s) ablate the affinity of CGBP for these sequences. Introduction of mutations that either alter the sequence flanking the CpG dinucleotide or reduce the number of CpG motifs leads to a reduction in the affinity of CGBP. These results support the consensus CGBP binding site deduced from inspection of ligand-selected sequences recovered from a pool of degenerate oligonucleotides. Interestingly, in 20 of the 35 recovered sequences containing two CpG dinucleotides, these motifs are spaced 12-16 base pairs apart. Mutation of individual CpG motifs reduces the affinity of CGBP. These results suggest that CGBP exhibits greater affinity for clustered regions of CpG motifs, as found in CpG islands.
The studies reported here establish that the CXXC domain is responsible for the observed binding specificity of CGBP for unmethylated CpG motifs and provide fundamental information regarding the structural basis for the unique DNA binding specificity of CGBP. These data should aid in future efforts to identify authentic target genes of CGBP action in vivo as well as more detailed analyses of the function of CXXC within other DNA-binding proteins.