The Multi-zinc Finger Protein ZNF217 Contacts DNA through a Two-finger Domain*

Background: Classical C2H2 zinc finger proteins generally bind DNA via a three-finger motif. Results: We have identified the DNA site recognized by ZNF217 and defined its mechanism of binding. Conclusion: Two classical C2H2 zinc fingers, rather than the typical three, are sufficient to bind an eight base pair sequence. Significance: This work broadens our understanding of DNA binding by classical zinc fingers. Classical C2H2 zinc finger proteins are among the most abundant transcription factors found in eukaryotes, and the mechanisms through which they recognize their target genes have been extensively investigated. In general, a tandem array of three fingers separated by characteristic TGERP links is required for sequence-specific DNA recognition. Nevertheless, a significant number of zinc finger proteins do not contain a hallmark three-finger array of this type, raising the question of whether and how they contact DNA. We have examined the multi-finger protein ZNF217, which contains eight classical zinc fingers. ZNF217 is implicated as an oncogene and in repressing the E-cadherin gene. We show that two of its zinc fingers, 6 and 7, can mediate contacts with DNA. We examine its putative recognition site in the E-cadherin promoter and demonstrate that this is a suboptimal site. NMR analysis and mutagenesis is used to define the DNA binding surface of ZNF217, and we examine the specificity of the DNA binding activity using fluorescence anisotropy titrations. Finally, sequence analysis reveals that a variety of multi-finger proteins also contain two-finger units, and our data support the idea that these may constitute a distinct subclass of DNA recognition motif.

Classical C2H2 zinc finger proteins are among the most abundant transcription factors found in eukaryotes, and the mechanisms through which they recognize their target genes have been extensively investigated. In general, a tandem array of three fingers separated by characteristic TGERP links is required for sequence-specific DNA recognition. Nevertheless, a significant number of zinc finger proteins do not contain a hallmark three-finger array of this type, raising the question of whether and how they contact DNA. We have examined the multi-finger protein ZNF217, which contains eight classical zinc fingers. ZNF217 is implicated as an oncogene and in repressing the E-cadherin gene. We show that two of its zinc fingers, 6 and 7, can mediate contacts with DNA. We examine its putative recognition site in the E-cadherin promoter and demonstrate that this is a suboptimal site. NMR analysis and mutagenesis is used to define the DNA binding surface of ZNF217, and we examine the specificity of the DNA binding activity using fluorescence anisotropy titrations. Finally, sequence analysis reveals that a variety of multi-finger proteins also contain two-finger units, and our data support the idea that these may constitute a distinct subclass of DNA recognition motif.
Transcription factors are sequence-specific DNA-binding proteins that localize to promoters and enhancers/silencers and recruit co-regulatory factors, such as histone modifying enzymes, to turn genes on or off (1). Sequence-specific DNA binding can be achieved through a number of different structural domains, and most known transcription factors are categorized by the nature of their DNA recognition domains. The most prominent classes of transcription factors are classical (or C2H2) zinc fingers (2), homeodomains (3), basic leucine zipper (4), basic helix-loop-helix (5), nuclear receptor domains (6), and MADS boxes (7). The relative abundance of the different domains varies somewhat between organisms, with nuclear receptors, for instance, being more abundant in the worm Caenorhabditis elegans than in other organisms (8) and MADS boxes being particularly common in plants (9), but overall classical zinc fingers are the major class of sequence-specific DNA binding domains across Eukarya.
There are several reasons why zinc fingers might have become so abundant during evolutionary history. First, they are small structures that tend to be thermodynamically stable (in part due no doubt to the cross-linking effect of the zinc coordination) and are dependent on only a small number of residues for proper folding, perhaps allowing rapid evolution. Indeed, a zinc binding module has been recorded to be among the first domains to arise during in vitro evolution experiments (10,11). Second, these domains bind DNA in a modular fashion, which allows mixing and matching to create proteins with novel DNA binding specificities through the addition of extra zinc fingers. It is also for these reasons that zinc fingers have been successfully used for the generation of artificial transcription factors and nucleases with novel specificities (12)(13)(14). Finally, zinc fingers can mediate interactions with RNA or with other proteins (15)(16)(17), and this functional diversity, although currently not well understood, may also have led to their proliferation in the genome.
Although classical zinc finger proteins have been extensively studied, the vast majority of the work has been centered on arrays of three or four classical zinc fingers separated by canonical TGE(K/R)P linkers. The three-dimensional structures of several such arrays bound to DNA target sites have been determined (for example, Zif268 (18), Gli5 (19), TFIIIA (20), and the designed zinc finger protein Aart (21)), revealing a shared recognition mode in which residues on one surface of the ␣-helix of each finger make base-specific contacts in the major groove of the DNA. Additional stabilizing interactions involving other residues in the helix and residues in the TGE(K/R)P linkers are also commonly observed.
It is notable, however, that a large number of zinc finger proteins do not have three or four tandem-arranged zinc fingers separated by the canonical linker sequences, raising the question of how these proteins identify their target elements and indeed whether or not they are DNA-binding proteins at all.
We have studied one large multi-zinc finger protein, namely ZNF217, and investigated the mechanisms by which it regulates gene expression (22). ZNF217 has been recognized as an important oncogene, with overexpression of ZNF217 associated with breast, ovarian, and numerous other cancers (23)(24)(25). The mechanisms through which it operates are complex (26 -28), but one proposal has been that ZNF217 directly binds to and represses the E-cadherin gene promoter via a two-zinc finger domain that recognizes the consensus sequence CAGAAY (29). E-cadherin is an important cell adhesion molecule, and repression of E-cadherin, which presumably reduces cell-cell contacts, has been associated with cancers with increased metastatic potential (30,31).
We began by investigating how ZNF217 might localize to the E-cadherin promoter and confirmed the previous observation that only fingers 6 and 7 detectably contact DNA (29). However, our results suggested that the E-cadherin site was bound with low affinity, prompting further mapping and mutagenesis experiments to search for higher affinity recognition sites. This work suggested a preferred site of (T/A)(G/A)CAGAA(T/G/C), which is related to but distinct from the previously proposed CAGAAY site. We then showed that the integrity of the double-finger domain is required for functional regulation through this DNA element. We went on to examine the affinity with which the two-finger domain contacted the preferred site and to determine whether it used similar molecular contacts to those found in three and four zinc finger units. Our results demonstrate that a single two-finger motif rather than the usual three or four finger units is sufficient to allow ZNF217 and potentially other multi-zinc finger proteins to contact DNA.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-Residues 469 -525 of human ZNF217 were subcloned into the pGEX-2T vector, and the construct was overexpressed in E. coli BL21 cells overnight at 18°C by the addition of 0.4 mM isopropyl 1-thio-␤-D-galactopyranoside to a log phase culture. Expressed proteins were purified by glutathione affinity chromatography. For NMR analysis, the ZNF217-F67 was cleaved from GST with thrombin, and the resulting protein was purified by gel filtration chromatography (S75) carried out in 50 mM Tris (pH 7.4) containing 300 mM NaCl and 1 mM DTT. Point mutants were handled similarly, except that gel filtration was not carried out.
Luciferase Repression Assays-HEK-293 cells were transfected using FuGENE 6 (Roche Applied Science) in 6-well plates with between 0.25 and 2 g of pMT3-FLAG-ZNF217 or point mutations of pMT3-FLAG-ZNF217. Also included was 3 g of reporter plasmid (either pGL2-(TGCAGAAT) 3 -LexA-Luc or pGL2-(CAGAAT) 3 -LexA-Luc or pGL2-(CTG-GAGTA) 3 -LexA-Luc) and 1 g of LexA-VP16 expression plasmid. pMT3-FLAG empty vector was added to make the amount of DNA in each transfection equal. 10 g of a plasmid expressing Renilla Luciferase (pRL-Luc, Promega) was used in each transfection to control for transfection efficiency. Cells were incubated for 48 h after transfection, and luciferase assays were then performed using the Promega Luciferase Assay System.
Data-driven Three-dimensional Structure Prediction Using CS-Rosetta-The 13 C␣, 13 C␤, 13 CЈ, 15 N, H␣, and H N chemical shift assignments of ZNF217-F67 were used as input for the CS-Rosetta software (33) to generate data-driven models of ZNF217-F6 (residues 469 -494) and ZNF217-F7 (residues 495-525). For each domain, 10,000 models were generated, and the top 500 models ranked by energy were chosen for further analysis. The 10 lowest-energy structures for ZNF217-F6 and ZNF217-F7 were accepted according to the published criteria (33); 1) a "funneling" distribution, indicating a convergence of the structure prediction, was observed in the plot of Rosetta all-atom energy against C␣ root mean square deviation relative to the model with the lowest energy, and 2) the low energy models clustered within 2 Å from the model with the lowest energy. The structures were visualized by using PyMOL.
Fluorescence Anisotropy Titrations-GST-tagged ZnF217-F67 and fluorescein-labeled dsDNA oligonucleotides (WT sequence, 5Ј-fluorescein-TCCATTGCAGAATTGTGG-3Ј; mutated sequence, 5Ј-fluorescein-TCCATCTGGAGTAT-GTGG-3Ј) were dialyzed into a 10 mM phosphate buffer (pH 6.5) containing 50 mM NaCl and 1 mM DTT. Fluorescence anisotropy titrations were performed at 25°C on a Cary Eclipse fluorescence spectrophotometer with a slit width of 10 nm, and data were averaged over 15 s. The excitation and detection wavelengths were 495 and 520 nm, respectively. In each titration the fluorescence anisotropy of a solution of 50 nM fluorescein-tagged dsDNA was measured as a function of the added protein concentration. Binding data were fitted to a simple 1:1 binding model by nonlinear least squares regression. Each titration was performed three times, and the final affinity was taken as the mean of these measurements.

RESULTS
Zinc Fingers 6 and 7 Are Capable of Binding DNA-ZNF217 contains eight classical C2H2 zinc fingers arranged in two major clusters (Fig. 1). To identify the domain(s) responsible for DNA recognition, we first expressed the various zinc finger clusters as GST fusion proteins in bacteria and purified them by glutathione affinity chromatography. Recent work has suggested that ZNF217 binds a particular cis-element, CAGAAC, in the human E-cadherin promoter (29). We tested various zinc finger combinations of both mouse and human ZNF217 with this element in an EMSA. We detected weak binding with murine fingers 6 and 7 (mZNF217-F67, Fig. 2A, lanes 8 and 9). Supershift experiments with an anti-GST serum eliminated the retarded band, consistent with the view that the binding was due to the GST-mZNF217-F67 recombinant protein. However, we were not able to detect any binding of the human protein (ZNF217-F67) to the E-cadherin promoter ( Fig. 2A, lanes 6 and 7). Furthermore, no binding was observed with either mouse or human constructs encompassing the other zinc fingers (data not shown).
We also tested the DNA binding ability of various ZNF217 zinc finger constructs with Pentaprobe, a collection of DNA fragments that together contain all possible five-base pair sequences (34). Again, no binding was detected with individual fingers 1, 5, and 8 and a domain containing fingers 1-4, whereas fingers 6 and 7 were able to bind to the Pentaprobe sequence (data not shown). These data suggest that fingers 6 and 7 are the main determinants of DNA recognition by ZNF217.

ZNF217 Recognizes an 8-bp Sequence with the Consensus ((T/A)(G/A)CAGAA(T/G/C)-While
carrying out the DNA binding experiments described above, we noted that binding of ZNF217 to the E-cadherin promoter sequence, although reproducible, was barely detectable. We then repeated recently published EMSA experiments using a consensus ZNF217 binding sequence identified by site selection (CASTing) (29). This site, CAGAAT, is related to but distinct from the actual site identified in the E-cadherin promoter. This sequence gave rise to robust DNA binding with both the murine and human GST fusion proteins encompassing zinc fingers 6 and 7 of ZNF217 (Fig. 2B). To confirm that the observed bands arose from sequence-specific binding to the intact zinc finger domain, we mutated a key residue that is required for zinc finger structure. The first cysteine residue in zinc finger 6 of ZNF217 (C473) was mutated to alanine (ZNF217-F67 C473A ); this change eliminated binding to both the E-cadherin promoter and the CAST-derived oligonucleotide, confirming that DNA recognition was dependent on zinc finger integrity (Fig. 2).
To investigate why the site-selected probe was robustly recognized by mouse and human GST-ZNF-F67, whereas the E-cadherin promoter site was not, we carried out a series of mutagenesis experiments on the CAST-selected probe to define the site most strongly bound by ZNF217-F67 in EMSA assays. We generated and tested both 5Ј and 3Ј deletions and site-specific mutations of the probe and identified a core sequence of TGCAGAAT (Fig. 3A, Table 1). This site is related to the previously identified essential CAGAAC core present in  the human E-cadherin promoter. However, it also includes two additional 5Ј-flanking residues, TG, that are not found in the E-cadherin site but that appear to be important for binding.
To assess the site in more detail, we carried out EMSAs using a series of mutant probes in which each nucleotide in the extended core TGCAGAAT was mutated to every other residue (Fig. 3B, Table 2). The results confirmed that all eight residues can influence DNA binding and identified a consensus site of (T/A)(G/ A)CAGAA(T/G/C). This site differs from the E-cadherin promoter site (ATCAGAAC) at two critical residues, explaining the difference in binding properties observed between Fig. 2, A and B.
ZNF217 Can Repress Transcription through Its Consensus DNA-binding Site in Cellular Assays-ZNF217 has been shown to function as a transcriptional repressor. It recruits co-repressors of the C-terminal-binding protein (CtBP) 3 family via Pro-Ile-Asp-Leu-Ser (PIDLS) and Arg-Arg-Thr (RRT) motifs (22,35). CtBPs then recruit other co-regulatory molecules such as histone deacetylases, histone methyltransferases, and histone demethylases to repress gene expression (36 -38). To determine whether ZNF217 was able to repress transcription in cellular assays via the consensus target site we had identified, we constructed a simple reporter system. We used a luciferase reporter gene, driven by a minimal adenovirus 1B promoter containing a LexA binding site and three copies of the ZNF217 consensus element. Transcription was activated by the expression of a LexA-VP16 fusion protein, and repression was measured when full-length ZNF217 was co-expressed in HEK-293 cells. As shown in Fig. 4A, dose-dependent repression was observed when increasing amounts of a ZNF217-encoding expression vector were added. In contrast, a similar vector FIGURE 3. ZNF217-F67 binds to a consensus sequence of (T/A)(G/A)CAGAA(T/G/C). A, 2 g of recombinant GST, GST-ZNF217-F67, and GST-ZNF217-F67 C473A were tested in EMSA for their ability to bind to probes containing either mutations or deletions in and around the core binding consensus of TGCAGAAT. Mutated bases were altered to G residues. The sequences of the probes and the relative strength of binding are given in Table 1. B, 2 g of recombinant GST, GST-ZNF217-F67, and GST-ZNF217-F67 C473A were tested in EMSA for their ability to bind to probes containing individual site-specific mutations of the TGCAGAAT core DNA binding sequence. The sequences of the probes and the relative strengths of binding are given in Table 2.

TABLE 1 ZNF217-F67 binds to a core sequence of TGCAGAAT
Shown are the probe sequences used in Fig. 3A to determine the core binding sequence for ZNF217-F67. Sequences are aligned to indicate deletions, and mutated bases are shown in bold. The eight-base pair core is underlined in the probe for gel 1. Also shown is the relative strength of binding of ZNF217-F67 to each of the probes.

TABLE 2 ZNF217-F67 binds to a consensus sequence of (T/A)(G/A)CAGAA(T/ G/C)
Shown are the probe sequences used in Fig. 3B to define the ZNF217-F67 binding consensus sequence. Mutated bases are shown in bold. Also shown is the relative strength of binding of ZNF217-F67 to each of the probes.

Probe
Sequence Interaction strength TCCATTGCAGAACTGTGG ϩϩϩ encoding the ZNF217 C473A mutant produced only minimal repression. These results confirm the ability of ZNF217 to function in gene repression and indicate that an intact ZNF217-F67 domain is required for binding to target promoters containing the (T/A)(G/A)CAGAA(T/G/C) consensus. As a further control for the specificity of the sequence element recognized by ZNF217, we constructed two additional reporter genes. One carried 3 copies of the CAGAAT element related to that found in the E-cadherin promoter instead of the 8-bp core we identified above, whereas the other contained mutations chosen to severely disrupt our core element. We again observed strong dose-dependent repression by ZNF217 with the new core element (TGCAGAAT) and observed minor but detectable repression using the shorter sequence related to the E-cadherin core (CAGAAT) (Fig. 4B), consistent with the weak binding observed in vitro to this latter site ( Fig. 2A). In contrast, we observed no significant repression with the mutated version of the new core (CTGGAGTA) (Fig. 4B). Finally, mutation of the critical cysteine 473 to alanine in finger 6 essentially abrogated repression (Fig. 4B). Taken together these results confirm that ZNF217 can function as a transcriptional repressor and demonstrate that it can be localized to its target promoter in a sequence specific manner via a DNA binding domain that comprises zinc fingers 6 and 7.
Identification of Residues That Are Important for DNA Recognition-Given that the binding of two zinc fingers to an eight-base recognition site was somewhat unexpected (typically three zinc fingers are required for binding to sites of around nine bases), we next probed the nature of the ZNF217-DNA interaction at a molecular level using NMR spectroscopy. A 15 N HSQC spectrum of uniformly 15 N-labeled ZNF217-F67 (residues 469 -525) showed excellent dispersion, indicating that the construct formed a stable structure in solution (Fig. 5). We used standard triple resonance approaches to assign the signals in the 15 N HSQC to specific residues in the protein and then used the program CS-Rosetta (33) to calculate three-dimensional structures for each domain based on the chemical shifts. These structures (Fig. 6, A and B) have folds that closely resemble each other (the backbone root mean square deviation over the structured regions of the lowest energy models of F6 and F7 is 0.9 Å) and other classical zinc fingers (the backbone root mean square deviation of finger 3 of Zif268 to F6 is 1.1 Å and to F7 is 1.0 Å). Fig. 6C shows an overlay of F6 and F7 with fingers 2 and 3 from Zif268, showing the high structural similarity.
We assessed DNA binding by ZNF217-F67 by titrating into a sample of 15 N-labeled ZNF217-F67 a 14-bp double-stranded oligonucleotide bearing the extended recognition sequence deduced above (5Ј-CATTGCAGAATTGT-3Ј). As shown in Fig. 7A, many signals shifted after the addition of DNA, and the good quality of the 15 N HSQC spectrum after saturation with DNA indicates the formation of a well defined and well ordered complex. We recorded triple resonance data for the protein-DNA complex and again made assignments of the backbone atoms. Fig. 7B shows the magnitude of the chemical shift changes for the backbone nuclei (HN, N) of each residue after the addition of DNA. A larger number of significant changes was observed in finger 6 compared with finger 7. These changes are shown mapped onto the structural models of each ZF in Fig. 7C.
Chemical shift changes may be indicative of direct DNA contacts or may result from local perturbations in the structure arising from the binding event. To assist in distinguishing between these two possibilities and to corroborate the NMR data, we mutated finger 6 and 7 residues to alanine or, where alanines were present, to glutamine. We then tested the ability of these mutants to bind to DNA in EMSAs ( Fig. 8 and supplemental Table 1). One-dimensional 1 H NMR spectroscopy was used to confirm that all mutants, apart from Y506A, folded correctly. DNA binding was found to be severely compromised for a number of mutants in both zinc fingers. As expected, mutations that interfered with zinc binding, such as C473A, H489A, and C504A, disrupted DNA binding, and it was also notable that mutations in the TGEKP linker between the two zinc fingers abrogated binding.
Affinity and Specificity-To quantify the DNA binding activity of ZNF217, we used fluorescence anisotropy to measure the affinity of the interaction between ZNF217-F67 and a dsDNA oligonucleotide containing the TGCAGAAT sequence (Fig. 9). The data fit well to a simple 1:1 binding model and gave a dissociation constant of 80 Ϯ 25 nM. To assess the specificity of binding, we carried out a fluorescence anisotropy titration using an oligonucleotide in which the core recognition sequence was mutated to CTGGAGTA. The affinity we measured was ϳ2.5 times lower (190 Ϯ 50 nM) than the affinity measured for the optimized sequence.

DISCUSSION
DNA Recognition by ZNF217-The mechanisms by which proteins can be targeted to particular DNA sequences in the genome are of considerable interest. Zinc finger proteins are the most abundant DNA-binding proteins found in eukaryotic genomes, and there have been recent successes with targeting artificial zinc finger proteins to chosen genomic sites (12)(13)(14). Although the mechanisms by which some naturally occurring zinc finger proteins bind DNA have been identified, the vast majority of work has focused on constructs comprising three tandem-arranged classical zinc fingers. However, an examination of the human genome reveals that many hundreds of classical zinc finger domains do not lie within closely spaced threefinger modules of this type, and although several have been shown to function as protein recognition modules (39 -42), the functions of the vast majority of these domains remain undefined.
Here we have investigated the eight-zinc finger protein ZNF217 and shown that a construct comprising fingers 6 and 7 of this protein is able to bind an eight base pair double-stranded DNA site in a sequence-specific manner. Furthermore, this double finger domain is able, in the context of the full-length ZNF217 protein, to mediate transcriptional repression in a cellular reporter gene assay. The conventional view has been that three classical zinc fingers are required for physiologically relevant DNA binding, with a few exceptions such as the single GAGA zinc finger (43) and the Tramtrack pair of zinc fingers (44); our work strengthens the argument that shorter two-finger units can be functionally relevant for DNA recognition.
Typically, classical zinc fingers use residues at positions Ϫ1, 2, 3, and 6 of their ␣-helix to make sequence-specific contacts with DNA (Fig. 10A) and can make additional interactions with the phosphodiester backbone via other residues in the ␣-helix. Our NMR data show that the helix of F6 is strongly involved in   binding DNA. Fewer large chemical shift changes were observed in F7, although a substantial change is observed for Gln-510, which lies at the Ϫ1 position in the F7 helix. Because 15 N HSQC titration data are most sensitive to changes in the conformation or environment of nuclei in the backbone of the protein, we also carried out alanine-scanning mutagenesis across the F67 polypeptide to more directly assess the involvement of side chains in contacting the DNA. In Fig. 10, B and C, the structures of F6 and F7 are overlaid in the positions of F2 and F3 of Zif268 (as shown in Fig. 6) with the residues that were mutated, shown in space-filling representation. Red indicates  Residues were mutated to alanine (or glutamine where alanines were already present) and were expressed as GST fusion proteins. 1 g of each mutant was used in EMSA to assess binding to an oligonucleotide containing the consensus sequence TGCAGAAT (See also supplemental Table 1). FIGURE 10. Residues important for ZNF217-F67 DNA recognition. A, the amino acid sequences of F6 and F7 of ZNF217 are shown together with the sequence of finger 1 from the prototypical DNA binding zinc finger protein Zif268. Residues that typically make sequence-specific contacts with DNA in classical zinc fingers are boxed with solid lines, residues that often make nonspecific interactions with the DNA backbone are shown in dashed boxes, residues that underwent substantial chemical shift changes upon the addition of DNA to ZNF217-F67 are underlined, and zinc-ligating residues are indicated with asterisks. Those residues shown by site-directed mutagenesis to mostly or completely eliminate DNA binding are in bold and colored red; residues that reduced but did not abolish DNA binding are in bold and colored purple. Numbering of the ␣-helix is that typically used for classical zinc fingers. The secondary structure for F6 and F7, as predicted from an analysis of F67 chemical shifts, is shown above the sequences. B and C, the structures of F6 and F7 were overlaid onto F2 and F3 of Zif268 in the x-ray crystal structure of this protein bound to DNA (PDB 1ZAA). Residues that were mutated in Fig. 8 are shown in space-filling representation. Those residues shown by site-directed mutagenesis to mostly or completely eliminate DNA binding are colored red, residues that reduced but did not abolish DNA binding are colored pink, and residues that had little or no effect on DNA binding are shown in white. Red residues are more prominent on the surface of the domains used in Zif268 to contact DNA, whereas white residues are concentrated on the opposite face of each domain. B shows residues in F6, whereas C shows residues in F7. The right-hand panels show F6 or F7 in the same orientation as the corresponding left-hand panel for reference. residues that were essential or very important for DNA binding, pink indicates those that make some contribution to binding, and white designates residues that do not contribute measurably to binding.
It is clear from Fig. 10 that the majority of residues that make substantial contributions to DNA binding lie on a surface that, in Zif268, forms the DNA contact surface. Many of these residues are in canonical DNA binding positions in the ␣-helix (as indicated in Fig. 10A), and it is also notable that Lys-478 was identified as making contact with DNA. A basic residue in this position has been shown to contact the DNA backbone in other zinc finger-DNA complexes, such as Zif268 (18) and TFIIIA (45).
Two of the residues for which significant chemical shift changes were observed, namely Leu-486 and Leu-490, proved to be dispensable for binding, according to our mutagenesis data, suggesting that the chemical shift changes observed for these residues might be caused by local conformational rearrangements; consistent with this idea, we have observed significant changes in the dynamics of the helices of classical zinc fingers from other proteins after DNA binding. 4 Two further residues that were identified in our NMR analysis are zinc binding residues (His-489 and Cys-504), and their direct involvement in DNA binding is difficult to ascertain via mutagenesis because of their structural importance. The equivalent histidine in other zinc finger-DNA complexes does lie near the DNA backbone and commonly makes an electrostatic interaction with a phosphate group (2). Finally, mutation of any residue in the linker between F6 and F7 abrogated binding, indicating that this sequence is important for DNA recognition, as observed for other zinc finger-DNA complexes. These data together point to the conclusion that the overall topology of the ZNF217-DNA complex resembles previously characterized complexes containing more than two zinc fingers.
A notable feature of the ZNF217-DNA interaction is the observation that a two-zinc finger unit appears to recognize an eight-base pair DNA sequence. Although classical zinc fingers are sometimes thought of as contacting a three-base pair unit, there are a number of examples of contacts being made by residues in the ␣-helix to bases that lie 3Ј to the canonical recognition sequence (Fig. 11). Interactions between residues at the ϩ2 position in the helix and the base pair immediately 3Ј to the core three-base pair target site are common (46), and interactions to bases as far as three nucleotides away have also been observed. Furthermore, an arginine at the ϩ10 position in finger 3 of TFIIIA makes a base-specific contact with a guanine (47) that lies outside the three-base recognition site in the 5Ј direction. Thus, it is possible in principle for a two-finger unit to exhibit sequence preferences across a recognition site of up to 12-14 base pairs, and mechanisms of this type are most likely used by ZNF217 to confer selectivity across its 8-base pair target site.
DNA Recognition by ZNF217 in Vivo-The consensus sequence (T/A)(G/A)CAGAA(T/G/C) that we have identified differs from two previously proposed recognition sequences. It is highly related to the core sequence CAGAAY identified by PCR site selection experiments (29) but differs in that it contains two additional 5Ј-flanking residues. Our data show that these two residues are important for binding. Surprisingly, a quite different consensus (ATTCC(G/A)AC) was proposed after a bioinformatics analysis of ZNF217-regulated genes identified from ChIP-ChIP data (27). This may reflect the ability of in vivo assays to identify biological sites where binding is weaker than those selected by in vitro assays. Alternatively, the bioinformatics-derived consensus sequence may represent the site of binding of a different transcription factor onto which ZNF217 associates, via protein-protein interactions (piggy-backing).
The question of how ZNF217 in particular localizes to the E-cadherin promoter remains unresolved. The previously identified CAGAAY core element in the E-cadherin promoter may be involved, but our work suggests that this is a low affinity site, because it lacks the critical 5Ј-flanking residues that are present in the extended core element. There is some good evidence that ZNF217 does localize to and repress E-cadherin, including chromatin immunoprecipitation experiments (29), but on the other hand, E-cadherin did not emerge from the microarray experiments (27), perhaps suggesting it is a ZNF217 target in some but not all tissues. Recent experiments have indicated very clearly that transcription factors identify different subsets of target genes in different cell types either as a result of chromatin accessibility or the availability of different partner proteins (48). Our work showing relatively weak binding to the E-cadherin promoter suggests that ZNF217 might not localize to this sequence solely via the zinc finger 6 and 7 domain. It is possible that it uses additional mechanisms such as proteinprotein interactions. ZNF217 binds to CtBP (22), as do several other proteins implicated in the regulation of E-cadherin, such as Zeb (49) and Klf8 (50,51). The fact that CtBP can multim-4 J. Font and J. P. Mackay, unpublished data.  46)). Rectangles represent nucleotides in dsDNA. The four nucleotides that have been shown most frequently to be contacted by residues in classical zinc fingers are shaded gray. Rounded rectangles show the position of residues in the recognition helix that have been shown to contact the indicated bases, and an example of a zinc finger that displays each type of interaction is given (Zif268 -1 indicates finger 1 of Zif268) (18,19,45,60). erize (37,52) raises the possibility that it forms a nucleus that helps recruit multiple transcription factors and in doing so stabilizes the interaction of these factors at the target promoter. We have also tested for direct protein-protein interactions between ZNF217 and Klf8 and between ZNF217 and Zeb but have found no clear evidence for direct contact. 5 How Much Affinity and Specificity Are Enough?-The majority of sequence-specific DNA-binding proteins for which quantitative studies have been carried out exhibit dissociation constants in the nanomolar range or lower. For example, Zif268 has been shown to bind a target sequence with a dissociation constant of 6 nM (18), and the HoxD9 and Antp homeodomains bind to their target sites with a K D of ϳ1.5 nM (53,54). It has consequently become commonly accepted that affinities of this magnitude represent "biologically significant" interactions and that weaker interactions are less likely to be relevant. However, a number of weaker interactions have been measured that have clear biological relevance. The DNA binding domain of THAP1 contacts its target site with an affinity of only 8 M (55), and the double zinc finger domain of Tramtrack recognizes DNA with a dissociation constant of 400 nM (56). The affinity measured in the current study for the ZNF217-DNA interaction (80 nM) thus falls well within the range of known affinities and further emphasizes the idea that biologically relevant protein-DNA interactions are not exclusively associated with nanomolar affinities.
These data lead to the question of whether weaker interactions hint at a mechanism of action that is distinct from that employed by "strong binders." Weaker interactions will almost certainly be associated with faster off-rates for the protein-DNA complexes, and so one possibility is that regulatory complexes involving such proteins are found more commonly in systems that must respond more rapidly to signals that are intended to alter the expression of target genes. It is also possible that weak interactors are generally found in large multiprotein complexes where DNA contacts are made by more than one protein simultaneously and that additional affinity is provided in that way. A system of this type could have the advantage of being able to "mix and match" DNA binding domains from different proteins and thereby target different genes depending on which regulatory proteins were available.
The other question raised by the DNA binding data for ZNF217 is that of what constitutes specificity for a sequencespecific DNA-binding protein. Our data showed that ZNF217-F67 did display a higher affinity for its selected target sequence than for an unrelated sequence, but the degree of selectivity was only a factor of 2.5. In contrast, the HoxD9 and Antp homeodomains bind to unrelated DNA sequences with K D values of 100 -300 nM (53,54), representing 60 -200-fold selectivity for their cognate targets, and the specificity of the GATA1 C-terminal zinc finger for its target site has been estimated to be ϳ1000-fold (57). However, quantitative assessments of DNA binding specificity are relatively rare, and it is possible that many other DNA-binding proteins do not display this high level of selectivity.
The question of how a DNA-binding protein finds its cognate targets in the "noise" of three billion base pairs of genomic DNA is largely an unanswered one, but it is often thought that such a protein must have high specificity to act effectively. Surprisingly, despite the selectivity of ZNF217 being apparently very low, our transient transfection data demonstrate that the protein can effectively find targets and generate functional outcomes in a cellular context (albeit on naked transfected DNA rather than a chromatinized target). This ability is significantly compromised when even small changes are made in the DNA recognition site (Fig. 4). The selectivity of ZNF217 is thus apparently substantially greater in vivo than in vitro. This phenomenon might reflect the recruitment of partner proteins that enhance specificity, although it is notable that the sequence preferences of ZNF217 in transient transfections closely match those observed in vitro with recombinant protein. That is, binding is strongest to the extended TGCAGAAT sequence and weaker to CAGAAT or CTGGAGTA oligonucleotides; it is simply the magnitude of the difference in specificity that is apparently magnified in cells. We have recently observed similar results during the analysis of an unrelated DNAbinding domain, 6 and so we speculate that small apparent specificities might well be functionally significant, although the mechanism underlying this phenomenon remains to be elucidated.
Implications for Other Zinc Finger Proteins-The structure of the two-zinc finger domain of Tramtrack bound to DNA (44) established the idea that two tandem classical zinc finger domains could mediate sequence-specific DNA binding, and it was later noted that a number of zinc finger proteins contained two-finger units (58), although at that time little was known about the properties of proteins other than Tramtrack. GATA-family transcription factors also use two zinc fingers to bind to DNA, but these domains are structurally distinct to the classical zinc fingers (59). Likewise, Drosophila GAGA factor (GAF) interacts with DNA via a single classical C2H2 finger, but in this case additional contacts to the minor groove are made by an N-terminal basic region (60). Despite these reports and although yeast zinc finger proteins have been predicted to bind DNA using two-finger motifs (61), it is generally accepted that in mammals classical zinc finger proteins tend to use three zinc fingers to contact their target DNA sequences (2).
An inspection of the UniProtKB protein sequence data base revealed that of the 838 confirmed human C2H2 zinc finger proteins, 18 contain only a single two-zinc finger unit. There are also Ͼ100 additional proteins containing multiple zinc fingers in which a discrete double-finger unit is present, and it is possible that these proteins can interact with DNA via a mechanism similar to that of ZNF217. Indeed, certain mammalian transcriptional regulators such as BCL11a and ZNF219 contain double zinc finger domains that are known to contact DNA (62,63). There are a number of other proteins, such as ZNF536, that may also have DNA binding activity via their double zinc finger domains (64). ZNF536, like ZNF217, also binds CtBP. On the other hand, zinc finger proteins such as FOG, which also contain double finger domains but lack the typical TGERP linker and appropriate spacing, may well not be capable of sequence-specific DNA binding.