Structural Analysis of MED-1 Reveals Unexpected Diversity in the Mechanism of DNA Recognition by GATA-type Zinc Finger Domains*

MED-1 is a member of a group of divergent GATA-type zinc finger proteins recently identified in several species of Caenorhabditis. The med genes are transcriptional regulators that are involved in the specification of the mesoderm and endoderm precursor cells in nematodes. Unlike other GATA-type zinc fingers that recognize the consensus sequence (A/C/T)GATA(A/G), the MED-1 zinc finger (MED1zf) binds the larger and atypical site GTATACT(T/C)3. We have examined the basis for this unusual DNA specificity using a range of biochemical and biophysical approaches. Most strikingly, we show that although the core of the MED1zf structure is similar to that of GATA-1, the basic tail C-terminal to the zinc finger unexpectedly adopts an α-helical structure upon binding DNA. This additional helix appears to contact the major groove of the DNA, making contacts that explain the extended DNA consensus sequence observed for MED1zf. Our data expand the versatility of DNA recognition by GATA-type zinc fingers and perhaps shed new light on the DNA-binding properties of mammalian GATA factors.

Members of the GATA family of transcription factors contain either one or two type IV zinc fingers (1). In mammals, six of these proteins, namely GATA-1-6, have been studied in some detail (2)(3)(4)(5)(6)(7), and functional data reveal that they are essential for the development of a range of different tissues. These six proteins are highly conserved in vertebrates, and functional GATA family transcriptional regulators have also been described in more distant organisms: for example, ELT-1 is essential for epidermal specification in Caenorhabditis elegans (8), whereas AreA and Nit2 regulate nitrogen metabolism in fungi (9,10).
It is generally recognized that GATA-type zinc fingers bind to double-stranded DNA that contains the consensus sequence (A/C/T)GATA(A/G) (1,(11)(12)(13). To date, the three-dimensional structures of three GATA-type zinc fingers in complex with DNA have been determined: the C-terminal zinc finger (C-finger) 3 of chicken GATA-1 (14), the C-finger of murine GATA-3 (15), and the zinc finger of Aspergillus nidulans AreA (16). Each has revealed a protein core that consists of two N-terminal ␤-hairpins and an ␣-helix; both the helix and the loop connecting the two hairpins make multiple contacts with the major groove of the DNA. In addition, each finger contains a C-terminal tail of ϳ20 residues that is rich in basic residues. These tails take up similar conformations in the three structures, "folding back" and wrapping around the DNA in an extended fashion. In each case, the tail makes a number of mostly nonspecific contacts with the minor groove and/or the sugar phosphate backbone.
MED-1 is a C. elegans GATA family transcription factor that plays a central role in the specification of the mesoderm and endoderm tissues by activating the expression of a range of downstream target genes (17). The protein contains a single zinc finger that shares ϳ53% sequence similarity with the chicken GATA-1 C-finger over the core zinc finger structure and 46% similarity when the basic tail regions are included (see Fig. 1A). Although it was anticipated that MED-1 would bind the consensus GATA sequence, Broitman-Maduro et al. (17) demonstrated that the MED-1 zinc finger (MED1zf) specifically recognizes a larger and somewhat divergent site, GTATACT(T/C) 3 .
We have examined the molecular basis for this unusual DNA-binding specificity. Our data indicate that a single arginine residue in the core zinc finger domain drives a specificity switch from AGATA to GTATA and that, unexpectedly, the C-terminal basic tail of MED1zf forms an additional well ordered ␣-helix that interacts with the 3Ј-pyrimidine-rich extension in the DNA consensus sequence. These results underscore the versatility of DNA recognition by zinc finger domains and have implications for the functions of the many GATA family proteins for which DNA-binding specificities have not yet been defined in detail.

EXPERIMENTAL PROCEDURES
Preparation of Proteins-Residues 108 -174 of MED-1 were overexpressed as a His-tagged fusion protein in pET15b. The use of Escherichia coli Rosetta2 cells was essential for obtaining reliable overexpression. Cell pellets were lysed in buffer containing 50 mM sodium phosphate, pH 8.0, 300 mM NaCl, 10 mM imidazole, 1 mM dithiothreitol (DTT), and one complete EDTA-free protease inhibitor mixture tablet (per 50 ml of lysis buffer; Roche Applied Science). His 6 -MED1zf was recovered from the soluble fraction and purified on nickel-nitrilotriacetic acid-agarose (Qiagen). The fusion tag was cleaved using thrombin (3-4 h at 25°C) in 50 mM Tris, pH 8.0, 150 mM NaCl, 2.5 mM CaCl 2 , and 1 mM DTT, and MED1zf was further purified by size exclusion chromatography (HiLoad 16/60 Superdex 75, Amersham Biosciences). The purified polypeptide contained an additional four N-terminal amino acids (GSHM) derived from the thrombin cleavage site. 15 N-and 15 N, 13 C-labeled MED1zf were prepared following the procedure of Cai et al. (18) and purified as described above. A range of MED1zf point mutants was created using site-directed mutagenesis and purified as described above, and their correct folding was confirmed by one-dimensional 1 H NMR spectroscopy.
Isothermal Titration Calorimetry-Purified MED1zf and double-stranded DNA (dsDNA) were dialyzed against 20 mM sodium phosphate, 40 mM NaCl, and 1 mM DTT, pH 6.5. The MED1zf concentration was determined by absorbance at 280 nm using the calculated extinction coefficient (⑀ ϭ 9770 M Ϫ1 cm Ϫ1 ). The DNA concentration was determined by absorbance at 260 nm using ⑀ ϭ 219,410 M Ϫ1 cm Ϫ1 . The dsDNA (90 M) was titrated into MED1zf (14 M, 1.4 ml) at 25°C in a series of 10 M injections, with a 4-min interval between injections, using a MicroCal VP-ITC microcalorimeter. The evolved heats were integrated and normalized for protein concentration. After base-line correction (using data from a titration of dsDNA into buffer), the data were fitted to a single-site model using Origin 5.0 (MicroCal).
Surface Plasmon Resonance-Kinetic analysis was performed on a Biacore 3000 surface plasmon resonance instrument (Biacore AB, Uppsala, Sweden). Biotinylated dsDNA was immobilized on a streptavidin-coated SA sensor chip (Biacore AB). The buffer used for all experiments was 30 mM sodium phosphate, 200 mM NaCl, 1 mM DTT, and 0.01% surfactant P20. The chip was pretreated according to the manufacturer's instructions with conditioning solution (3ϫ 100-l injections at 50 l/min with 50 mM NaOH and 1 M NaCl). The biotinylated dsDNA was diluted to 10 nM and injected onto one of the sensor chip channels (Fc-2) at a flow rate of 5 l/min for 5 min, resulting in an immobilization level of ϳ120 response units. The sensor chip was then washed with running buffer. Upstream unmodified channel surfaces were used for reference subtraction. Kinetic measurements at protein concentrations across the range 1 nM to 1 mM (40 l) were performed at 25°C with a KINJECT protocol and a flow rate of 20 ml/min. Wild type and mutant protein samples were sampled alternately, zero concentration samples were included for double-referencing, and three cycles were performed. The chip surface was regenerated with 10 mM Tris, pH 7.5, 500 mM NaCl, 1 mM EDTA, and 0.005% SDS between each set of protein samples. Data analysis was performed with BIAevaluation software (Biacore AB).
NMR Spectroscopy-Purified MED1zf was dialyzed into buffer containing 20 mM sodium phosphate, 40 mM NaCl, and 1 mM DTT, pH 6.5, to final concentrations of up to ϳ0.8 mM. NMR samples also contained 10 M 2,2-dimethyl-2-silapentane-5-sulfonic acid as a chemical shift reference and 5-10% (v/v) D 2 O. Spectra were recorded at 298 K on Bruker 600-and 800-MHz spectrometers equipped with cryoprobes. Samples of MED1zf mutants were prepared similarly. All homonuclear two-dimensional data were collected and analyzed as described (19). Mixing times were 70 and 100 ms for total correlation and nuclear Overhauser effect spectra, respectively. 15 N and 13 C chemical shift assignments were made from the standard suite of triple resonance experiments as described previously (20). NOE-derived distance restraints were obtained from two-dimensional NOESY and three-dimensional 15 N-separated NOESY. and restraints were included on the basis of an analysis of backbone chemical shifts in the program TALOS (21) and from 3 J HNHA scalar couplings measured in HNHA (22). One-bond HN residual dipolar couplings (RDCs) were recorded for the MED1zf⅐DNA complex in 6 mg of Pf1 phage (ASLA Biotech) using the in-phase/anti-phase pulse sequence (23). Correct alignment of the complex was checked by measuring the D 2 O splitting (12)(13)(14)(15). The program PALES (24) was used for the calculation of the magnitude and orientation of the sterically induced alignment tensor (see below for details). All NMR data were processed using TOPSPIN (Bruker, Karlsruhe, Germany) and analyzed with SPARKY 3 (25).
Structure Calculations-The program HADDOCK (26, 27) was used to calculate a data-driven model of the MED1zf⅐DNA complex. The starting structures for the docking were a B-form model of the DNA fragment (5Ј-CGGAAAAGTATACTTT-TCCG-3Ј) constructed by using the program 3DNA (28) running on the three-dimensional Dart server located at the Centre for Biomolecular Research at Utrecht University (The Netherlands). A model of the MED1zf structure (residues 111-166, excluding the unstructured N and C termini) was constructed based on the structure of chicken GATA-1. Residues 152-164, which are largely extended in the GATA-1 C-finger, were formed into an additional helix based on TALOS predictions, NOE data, and 3 J HNHA coupling constants. This structure was energy-minimized in CNS (three rounds of 200 steps with the backbone fixed) (29) and used as input for HADDOCK. During HADDOCK runs, MED1zf residues 114 -146 (zinc finger core) and 152-162 (helix 2) were defined as semiflexible, and residues 111-113 (N terminus), 147-151 (linker between the two helices), and 163-166 (C terminus) were defined as flexible. Within the semiflexible regions, the following residues were selected as "active" for the definition of ambiguous interaction restraints: 120 -125, 138 -146 (with the exception of Cys 139 , a zinc-ligating residue), and 152-162. These definitions were based on chemical shift perturbation data (see Fig. 4, A and B). For the DNA, ambiguous interaction restraints were defined solely from the unique base atoms of bases 7-16 and 25-34 of the core region and the 3Ј-flanking thymine stretch to suitable atoms (i.e. from hydrogen to nitrogen or oxygen and vice versa) (30, 31) of the above-mentioned residues of MED1zf based on the gel-shift experiments (see Fig. 3) and the chemical shift changes observed (see Fig. 4, C and D). Because the used DNA sequence is palindromic, it is not immediately clear on which strand the binding occurs; however, a few intermolecular NOEs (supplemental Fig. S1A) allowed us to unambiguously define the binding site on the DNA (3Ј-flanking thymine stretch of bases 14 -16 and 25-27 rather than 5-7 and 34 -37). In addition to the ambiguous interaction restraints, dihedral angle restraints, all unambiguously assigned intra-and intermolecular NOEs (upper distance limit of 6 Å), restraints for the tetragonal coordination of the zinc atom (32), and restraints to maintain the conformation of the DNA were used as inputs into HADDOCK (supplemental Table 1). After rigid-body minimization, semiflexible annealing, and water refinement, the best 10 structures within the lowest energy cluster (cutoff ϭ 7.5 Å) (26,27) were used as input for another semiflexible annealing and water refinement step. The resulting best 10 structures (cutoff ϭ 1.5 Å) of the lowest energy cluster overlaid with a root mean square deviation of 0.72 Å (backbone of MED1zf residues 114 -162 and DNA). At this stage, H N -N RDCs were introduced as direct restraints (using the SANI statement); axial and rhombic components of the alignment tensor (D a and D r ) were calculated using the ensemble of 10 structures and the software PALES (24). Semiflexible annealing and water refinement, as described above, were then carried out once using the RDCs as well as all other mentioned restraints, and the alignment tensor was recalculated based on the resulting best 10 structures (cutoff ϭ 1.5 Å), and HADDOCK was run again (see above) using these new values. After this run, the tensor components of the final 10 structures (cutoff ϭ 1.0 Å) were not significantly different from the ones from the previous stage. The root mean square deviation dropped from 0.72 Å to a final value of 0.40 Å. The final family of 10 structures was analyzed using standard HADDOCK protocols (supplemental Table 2); no major violations were observed. The structures have been deposited in the Protein Data Bank with accession code 2KAE.

RESULTS
The MED-1 Zinc Finger Binds DNA with 1:1 Stoichiometry-To examine the DNA-binding properties of MED1zf, we overexpressed and purified a peptide corresponding to residues 108 -174 of C. elegans MED-1 (Fig. 1A), which includes the core zinc finger region and the basic tail. Given the large predicted size of the recognition site for MED1zf compared with GATA-1, as well as its partially palindromic nature, we first sought to establish the stoichiometry of the MED1zf⅐DNA complex. We therefore subjected the purified protein to size exclusion chromatography incorporating a multiangle laser light-scattering detector. The observed peak ( Fig Having established the stoichiometry of the interaction, we used ITC to determine the affinity of the complex. The interaction is exothermic (⌬H ϭ Ϫ11 kcal/mol), and the data fit well to a 1:1 binding model with a dissociation constant of 13 nM (Fig.  2B). This affinity is very similar to the dissociation constant measured for the GATA-1⅐DNA complex (8 nM) (33).

MED-type Zinc Fingers Recognize a Different Consensus
Sequence-Although the MED-1 zinc finger has a substantial degree of sequence similarity to the mammalian GATA-type zinc fingers, we previously demonstrated that MED1zf specifically recognized the variant consensus sequence GTATACT(T/C) 3 (17). To determine which bases were critical for the MED1zf-DNA interaction, we carried out electrophoretic mobility shift assays using 32 P-labeled wild type DNA and mutant DNA oligonucleotides as competitors.
The first issue we addressed was whether the pyrimidine-rich region that flanks the core GTATAC sequence was important for binding. Because the core of the binding site is palindromic (GTATAC), there is the potential for MED1zf to bind the DNA  FEBRUARY 27, 2009 • VOLUME 284 • NUMBER 9

Data-driven Model of a MED-1⅐DNA Complex
site in two orientations. We therefore created two completely palindromic sequences in which the core GTATAC on both strands was flanked on the 3Ј side by either TTTT (the preferred sequence from the published site selection data) or GGGG. As shown in Fig. 3A, only AAAAGTATACTTTT could compete with the labeled probe (lane 4), whereas the other unlabeled probe (CCCCGTATACGGGG; lane 5) failed to compete, even at a 150-fold molar excess. This indicated that MED1zf binds the labeled probe (CCCCGTATACTTTT) in only one orientation and that the 3Ј-pyrimidine-rich stretch is an important component of the MED1zf recognition site. This latter finding is consistent with the observation that MED1zf target genes could be predicted based on the DNA consensus sequence (17). It is also notable that probes containing a consensus GATA site (TTATCA, corresponding to TGATAA) were unable to compete with the MED1zf sequence (Fig. 3, A,  lane 6; and B, lane 4).
Further mutagenesis was carried out within the core and flanking regions, whereby we substituted at each position thymine for guanine and adenine for cytosine and vice versa (Fig.  3A, lanes 7-21). In a second experiment (Fig. 3B), an oligonucleotide with a slightly different sequence 5Ј of the GTATA site (thymines instead of cytosines) was used, and each thymine was substituted by cytosine and each adenine by guanine and vice versa (lanes 5-16). Mutagenesis of each of the three 3Ј-flanking thymines resulted in slight to moderate reductions in binding affinity, whereas mutagenesis of the flanking 5Ј-cytosines (Fig.   3A) or 5Ј-thymines (Fig. 3B) gave rise to essentially no changes. All substitutions within the core palindrome, with the exception of the first thymine in Fig. 3A (lane 16), had a large impact on binding. Overall, these data suggest that the core region (GTATAC) as well as the 3Ј-flanking thymines constitute the binding site of MED1zf.
Mapping the MED1zf-DNA Interaction by NMR Spectroscopy-To gain insight into the structural basis of the MED1zf-DNA interaction, we analyzed the chemical shift changes that occur upon titration of 1 mol eq. of dsDNA into 15 N-labeled MED1zf. The 20-bp DNA sequence shown in Fig. 1B was used for the titration. Comparison of the 15 N HSQC spectrum of 15 N-labeled MED1zf before and after the addition of dsDNA reveals numerous shift changes (Fig. 4A), and the quality of the data are consistent with the formation of a stable 1:1 complex. The interaction is in slow exchange on the chemical shift time scale, consistent with the measured binding affinity.
We used triple resonance NMR data to assign MED1zf in both the absence and presence of DNA, and a summary of the backbone chemical shift changes is also shown in Fig. 4B. Fig. 4B shows the secondary structure content of MED1zf, both in the free form and in the presence of DNA, based on TALOS (21) and medium-range NOE data. It is clear that the secondary structure of the free zinc finger closely matches that of other GATA-type zinc fingers, with four short ␤-strands and a single ␣-helix (helix 1). Within the zinc finger core (Ser 112 -Tyr 146 ), the largest chemical shift changes following DNA binding are observed for the second and third ␤-strands and the loop connecting these strands. Some significant changes also occur in the center of the ␣-helix. However, the most striking effects occur across a 14-residue stretch of the basic tail, and an analysis of chemical shift and NOE data reveals that a substantial conformational change takes place in MED1zf upon DNA binding, with the formation of a second 11-residue ␣-helix within the formerly disordered basic C-terminal tail (helix 2, Val 152 -Lys 162 ).
Assignments of DNA resonances were also made in both the absence and presence of MED1zf. Fig. 4C shows that significant changes were observed for the imino protons of Thy 9 and Thy 29 of the core GTATAC motif. An analysis of chemical shift changes of imino (thymines and guanines), amino (adenine and cytosines), and methyl (thymines) protons (Fig. 4D) reveals major changes in the core motif; however, no significant shifts were observed in the flanking regions. Model of the MED1zf⅐DNA Complex-Attempts to crystallize a MED1zf⅐DNA complex were unsuccessful, 4 and so we used our NMR and gel-retardation data to derive a structural model for the complex. Only 13 intermolecular NOEs could be assigned from heteronuclear edited/filtered NOESY spectra (supplemental Fig. S1A), as well as 458 unambiguous intramolecular NOEs within MED1zf, which is well below the limit of 8 -10 NOEs per residue usually required for a full structure determination. In part, these difficulties were caused by the tendency of the complex to undergo degradation starting at 7 days after purification, as evident from the appearance of additional peaks in the 15 N HSQC spectrum. The NMR and gelretardation experiments described above, together with the limited number of intermolecular NOEs, do, however, provide a significant amount of data on the structure of the complex, and we supplemented these data with measurements of onebond H N -N RDCs for the protein in the complex.
We first created a homology model of the zinc finger core structure using the GATA-1 C-finger structure (Protein Data Bank code 1GAT) (14) as a template. Val 152 -Lys 164 in this model were further defined as being in a helical conformation, based on TALOS and medium-range NOE data, together with 3 J HNHA coupling constants. We used HADDOCK (26,27) to dock this model onto an idealized B-form dsDNA oligonucleotide, having established the B-form nature of the DNA from medium-range NOE patterns (34). For the data-driven docking process, 25 residues in MED1zf were defined as active based on chemical shift perturbation data, together with specific DNA base atoms from both the GTATAC core region and the 3Ј-pyrimidine-rich stretch. Ambiguous interaction restraints which were used to direct the docking calculations were defined between these groups. A combination of other restraints, including inter-and intramolecular NOEs from two-dimensional and three-dimensional 15 N NOESY spectra (see supplemental Fig. S1A), dihedral angles from TALOS predictions, and RDCs were used in the HADDOCK calculations. All side chains were allowed to undergo free motion, together with the backbone of the protein in regions that were not deemed to be part of regular secondary structure (see "Experimental Procedures" for details).
Following several HADDOCK docking runs that included simulated annealing and water refinement protocols, the 10 conformers with the lowest energies were taken to represent a model for the MED1zf⅐DNA complex. The family of structures is well converged, overlaying with a root mean square deviation of 0.40 Å over all protein and DNA backbone atoms, excluding the flexible termini of MED1zf (Fig. 5A and FEBRUARY 27, 2009 • VOLUME 284 • NUMBER 9 and the correlation between the predicted and observed H N -N RDCs is very good (r ϭ 0.98) (supplemental Fig. S1C). The zinc finger core packs against the major groove of the DNA at the GTATA site via the loop that connects ␤-strands 2 and 3 as well as residues in the ␣-helix (Fig. 5, B and C). In total, 27 hydrogen bonds and 34 hydrophobic contacts are observed in at least 50% of the structures (supplemental Table 2); of these, roughly onethird are base-specific. In the zinc finger core, Arg 124 forms hydrogen bonds with Gua 8 , as well as hydrophobic contacts with Thy 31 . These interactions, together with contacts made by Ile 123 and Arg 126 , define the sequence specificity of the protein at one end of the recognition site (the GTA in GTATAC). The next three bases are specified predominantly by hydrophobic interactions involving residues in the first helix of the zinc finger (namely Ile 141 , Tyr 142 , and Arg 144 ), consistent with the high-thymine content of the DNA. Arg 144 also forms hydrogen bonds with Thy 29 , explaining the significant chemical shift changes seen for this base (Fig. 4D). Helix 2 forms an extensive interface with the major groove of the pyrimidine-rich stretch, with Ala 154 , Tyr 158 , and Arg 161 forming specific interactions with DNA bases (Fig. 5, blue). We also used the program 3DNA (28) to assess the conformation of the DNA in the complex (supplemental Fig. S1B). Although some deviations from ideal geometry are observed, for example between Ade 10 and Thy 11 where Ile 141 contacts the DNA, a B-form-like conformation is largely maintained across the sequence.

Data-driven Model of a MED-1⅐DNA Complex
Validation of the Data-driven Model by Site-directed Mutagenesis-To assess the accuracy of our data-derived model of the MED1zf⅐DNA complex, we created a series of 22 MED1zf point mutants and used surface plasmon resonance to assess their ability to bind to DNA containing the MED1zf consensus sequence (Fig. 6 and supplemental Fig. S2). Biotinylated wild type DNA was bound to a streptavidin-derivatized Biacore chip and treated with solutions of either wild type MED1zf or MED1zf mutants. The K D for wild type MED1zf determined in this way (8 nM) (supplemental Fig. S2A) agrees well with the value measured by ITC (K D ϭ 13 nM) (Fig. 2B). Approximately half of the mutations reduced folding by Ͼ10-fold (Fig. 6, gray bars), and in excellent agreement with our structural model, mutations that affected binding were largely confined to those residues that contact the DNA (Fig. 6, boxed residues). Of particular note, helix 2 residues Ala 154 , Tyr 158 , and Arg 161 , which contact the DNA specifically, show significant reductions in K D , whereas Asn 156 and Gln 159 , which point away from the DNA, can be mutated without effect (Fig. 6). These results serve to confirm the orientation of helix 2 about its long axis with respect to the DNA.

DISCUSSION
Structural Basis for the Unusual Sequence Specificity of MED1zf-MED-1 is the first GATA-type zinc finger protein to be shown to recognize a consensus DNA sequence in which the core element differs from GAT. Furthermore, the recognition site for MED-1 is substantially longer than the canonical GATA site. Our data reveal the structural basis for these differences, showing that the GTATA core recognized by MED-1 corresponds to the AGATA motif in terms of the recognition mechanism. In all cases, recognition of DNA by these domains can be divided into three distinct regions. First, the ␤2 loop-␤3 region contacts bases at the 5Ј end. Here, Arg 124 makes basespecific hydrogen bonds with Gua 8 ; indeed, arginine-guanine interactions are the most common of all base-specific interac-   tions in protein⅐DNA complexes (35). This residue is replaced with a leucine in GATA proteins that recognize AGATA, and it appears that the specificity difference is driven in part by this substitution. Second, Ile 123 , together with residues in helix 1 of MED1zf, contacts that ATA sequence in GTATAC. These residues are highly conserved between MED1zf and other GATA zinc fingers (Fig. 1), explaining the observation that this part of the consensus sequence is completely conserved.
The greatest differences are observed in the C-terminal tail region. In structures of other GATA-type zinc fingers bound to DNA, this tail "doubles back" and wraps in an extended fashion around the DNA, contacting the backbone and/or the minor groove. However, in MED1zf, an ␣-helix is induced upon DNA binding, and this helix contacts the pyrimidine-rich sequence that is 3Ј to the GTATAC site. It is notable that this helix is not predicted by secondary structure prediction tools such as JUFO (36) or PORTER (37), highlighting one of the difficulties that is presented in computational efforts to predict protein structure without a knowledge of cognate binding partners.
Residues in helix 2 make basespecific contacts with three of the four thymines: Thy 14 -16 (Fig. 5 and supplemental Table 2). This observation is consistent with our competition binding data (Fig. 3), which show that mutation of the most distal thymine has a much smaller effect on binding. Thus, our data cannot readily explain the selection of this last thymine in previous site selection experiments (17). We have recently observed a similar phenomenon in a separate study, in which a site selection experiment revealed a six-base consensus sequence for a single-stranded RNA-binding protein, and yet mutation of either of the two outer bases had no measurable effect on binding. 5 It is possible that subtle sequence-dependent conformational effects within the oligonucleotide can affect binding without requiring direct protein-nucleic acid contacts.
Diversity in Nematode GATA Factors-Whereas C. elegans contains two med genes, it is notable that the med family has undergone significant expansion in several other nematode species. Caenorhabditis briggsae and Caenorhabditis remanei have four and seven predicted med genes, respectively, whereas the recently sequenced Caenorhabditis brenneri displays 25 med genes, although little is known about the functions of these genes. An alignment of the sequences of these proteins (supplemental Fig. S3) reveals substantial variation among, for example, the 25 putative C. brenneri genes, even including the residues that we have shown to be important for sequencespecific recognition of DNA in C. elegans MED1zf. This might point to the existence of different target sites for these proteins.   However, it is notable in this context that Arg 124 , which drives recognition of the G in the GTATAC motif, is a glutamine in three of the C. briggsae MED proteins. Our data show that an R124A mutation, which removes almost the entire side chain, only reduces DNA binding by 10-fold, so it is likely that the R124Q substitution, which would still allow the formation of one hydrogen bond to Gua 8 , would not significantly compromise binding to a GTATAC site. Indeed, we have previously shown that the C. briggsae and C. remanei MED proteins can substitute for C. elegans MED-1 and MED-2 when introduced as transgenes in C. elegans med-1,2(Ϫ) strains (38). Implications for the GATA Family of Transcription Factors-The accepted paradigm holds that GATA family proteins recognize the 6-bp consensus sequence (A/C/T)GATA(A/G). Site selection data have been published for GATA-1, -2, and -3 (11,13), and a close examination of the data reveals that additional sequence preferences might exist outside the limits routinely quoted. For example, chicken GATA-3 shows an identifiable preference for adenine at the underlined positions in the sequence NNAGATANNN and a stronger preference for purines at the italicized position (11). Sequence selection at the first, second, and final positions cannot be explained by the GATA-1 and AreA structures. In the same study, GATA-2 displayed a similar preference for adenine at the final position in the motif above, and a site selection carried out on human GATA-3 showed a bias toward purines at this same position (13). It should be noted that these experiments were carried out with full-length proteins (which contain a second adjacent zinc finger), so it is possible that these additional preferences are induced by structures outside the C-finger.
In this regard, it is notable that the structures of GATA-1 and AreA bound to DNA were determined using 16-and 13-bp oligonucleotides, respectively. Even if the C-terminal tails of these domains had a propensity for helix formation, fraying of the oligonucleotide would compromise the docking site for a new helix and thereby inhibit such a conformational change. It is intriguing that the tails of these two domains nevertheless form a single turn of helix that starts in a region that is highly conserved with MED1zf (and which forms part of the extra helix of the latter protein). Furthermore, it has been shown that the tail sequence QTRNRK in the C-finger of GATA-1 is important for determining the sequence specificity of this domain (39). However, inspection of the structure of the GATA-1⅐DNA complex indicates that five of the six side chains do not appear to make any significant contacts with the minor groove. In contrast, the MED1zf⅐DNA complex places these residues, which form part of helix 2, in the major groove, where they are able to fully engage with the DNA bases.
It is possible that the extended DNA site observed for MED1zf will also be observed for other GATA-type DNA-binding domains, including members of the mammalian and fungal GATA protein families. Manfield et al. (40) have recently shown that an extended version of AreA can bind up to two nucleotides on either side of the GATA site, indicating the existence of contacts in addition to those observed in the threedimensional structure of the complex of the minimal zinc finger protein with DNA. However, their additional C-terminal sequence extends well beyond the sequence used in our study. The formation of a helix at a subset of DNA target sites (ones that displayed the longer recognition sequence) could also potentially constitute a mechanism by which interactions with other cofactor proteins might be modulated. Such proteins might either recognize the tail only in a helical conformation or only in an alternative conformation, providing a mechanism by which GATA family proteins could have different activities at different DNA sites. It is also notable that the C-terminal tail of GATA family proteins can be post-translationally modified by, for example, serine phosphorylation and lysine acetylation (41,42). Such modifications could act as switches that regulate conformational changes in the tail region. Overall, the data in this study expand our understanding of the structural basis for DNA recognition by zinc finger domains and suggest new mechanisms through which these proteins might regulate gene expression in organisms ranging from fungi to mammals. . Mutational data corroborate the structural model. Surface plasmon resonance-derived DNA-binding affinities for MED1zf mutants are shown. Black, dark gray, and light gray bars show mutants with affinities that are altered by less than a factor of 10, 10 -100-fold, or Ͼ100-fold, respectively. Boxes indicate residues that contact DNA in the structural model. WT, wild type.