The N-terminal Zinc Finger of the Erythroid Transcription Factor GATA-1 Binds GATC Motifs in DNA*

The mammalian transcription factor GATA-1 is required for normal erythroid and megakaryocytic development. GATA-1 contains two zinc fingers, the C-terminal finger, which is known to bind (A/T)GATA(A/G) motifs in DNA and the N-finger, which is important for interacting with co-regulatory proteins such as Friend of GATA (FOG). We now show that, like the C-finger, the N-finger of GATA-1 is also capable of binding DNA but recognizes distinct sequences with the core GATC. We demonstrate that the GATA-1 N-finger can bind these sequences in vitro and that in cellular assays, GATA-1 can activate promoters containing GATC motifs. Experiments with mutant GATA-1 proteins confirm the importance of the N-finger, as the C-finger is not required for transactivation from GATC sites. Recently four naturally occurring mutations in GATA-1 have been shown to be associated with familial blood disorders. These mutations all map to the N-finger domain. We have investigated the effect of these mutations on the recognition of GATC sites by the N-finger and show that one mutation R216Q abolishes DNA binding, whereas the others have only minor effects.

GATA-1 is the founding member of a family of proteins implicated in the regulation of gene expression in organisms from yeast to man (1,2). The defining feature of the family is the presence of one or two GATA-type zinc fingers. GATA-type fingers contain four cysteine residues that coordinate a single zinc ion (3). The cysteines are arranged with a characteristic CX 2 CX 17 CX 2 C spacing, and the fingers share a number of additional residues, such that the consensus sequence is CXNCX 4 TPLWRRX 7 CNACGLYXK. GATA proteins from lower organisms (such as yeast) usually have a single GATA-type finger, whereas GATA proteins from higher organisms typically have two fingers, termed the N-terminal and the C-terminal zinc fingers.
Six highly related GATA family proteins have been identified in mammals, and all contain two zinc fingers (1). The most extensively studied member of the family is GATA-1 (4). This protein is present at high levels in hematopoietic cells, primarily in erythroid cells, mast cells, and megakaryocytes and is known to bind recognition elements in the control regions of genes expressed in these lineages (1). It has been shown that the C-terminal zinc finger of GATA-1 recognizes sequences of the form (A/T)GATA(A/G) and mutation or deletion of the Cfinger prevents binding to these sites (5). Experiments have demonstrated that the N-finger does not recognize these DNA elements-instead it is instrumental in recruiting co-regulatory proteins, such as Friend of GATA (FOG) 1 (6). Thus a general picture has emerged that regulation of gene expression by GATA-1 depends on DNA-binding mediated by the C-finger of GATA-1, and recruitment of various cofactors by the N-finger (or additional domains within GATA-1, including the C-finger).
Recent evidence suggests that the situation is more complex. The solution structure of the chicken GATA-1 C-finger bound to an AGATAA site in DNA has been solved by NMR spectroscopy, and the residues that directly contact DNA have been identified (Ref. 7 and Figs. 1 and 9). Interestingly, many of these DNA-contact residues are conserved in the N-terminal finger of chicken GATA-1 and also in mammalian GATA-1 N-fingers (2,5). Moreover, although it has been reported that the chicken GATA-1 N-finger does not bind DNA in isolation, it does play a subsidiary role in DNA binding, particularly in stabilizing binding to a small class of promoters that contain double GATA motifs (8,9). These results suggested that GATA N-fingers from various organisms can contact DNA and raised the possibility that the N-fingers of the mammalian GATA-1 proteins might have a more important role in DNA recognition than had previously been demonstrated.
Accordingly, we have investigated whether the murine GATA-1 N-finger is capable of functioning as an independent, sequence-specific DNA-binding domain. We demonstrate here that the N-finger recognizes GATC motifs in vitro. A naturally occurring mutation in GATA-1, R216Q (10), abolishes this interaction. We also show that GATA-1 is capable of activating transcription from promoters carrying GATC elements. These results extend the range of genes that may be regulated by GATA-1. They also suggest that the protein may adopt different configurations to bind different target genes.

MATERIALS AND METHODS
Plasmid Construction, Site-directed Mutagenesis, and Oligonucleotides-The plasmids pGEX2T/N-finger (200 -254) and pGEX2T/C-finger (249 -318) have been previously described (11). Mutant derivatives were generated by site-directed mutagenesis using standard methods, either single primer mismatch or overlap polymerase chain reaction, as previously described (11). Pfu polymerase (Stratagene) and mutant oligonucleotide primers (Life Technologies, Inc.) were used in the reactions. The reporter plasmids (TGATCT)-TATA-luciferase and (TGATAA)-luciferase were constructed by first inserting the ␤-globin TATA box (GATCTCGA CCTTGGGCATAAAAGTAGGGCAGAGCCCTCTATTGTTACATTT-GCTTA) between the BglII and HindIII sites of the luciferase reporter plasmid pGL3 (Promega) and then inserting upstream three copies of double-stranded oligonucleotides with the sequence GTACCTCTCCG-GCAACTGATCTGGCAACTGATCTGGCAACTGATCTGGACTCCCTGC and GTACCTCTCCGGCAACTGATAAGGCAACTGATAAGGCAACT-* This work was supported by a grant from the Australian Research Council (to J. P. M. and M. C.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
‡ GATAAGGACTCCCTGC for the N-finger TGATCT site and the C-finger TGATAA sites, respectively. The 5-aminolevulinate synthase (ALAS2) and platelet factor 4 (PF4) promoters contained residues Ϫ300 to ϩ37 (12) and Ϫ355 to ϩ19 (13) cloned directly upstream of the luciferase reporter gene in pGL3. The oligonucleotides used in the initial gel retardation assays had the sequence TCTCCGGCAACTGATCTGGACTCCCTG with variations around the central GAT core as indicated in Fig. 3. The oligonucleotides corresponding to the L-type pyruvate kinase (L-PK) enhancer (14), the ALAS2 promoter, and the PF4 promoter had the sequences GGGAGCATGGAGATCATAGCACTCCG, AAGGATGGTCTGATCT-CAAAATCGAA, and TGGCTGGCCAGATCTCAAGTACTGT, respectively.
Recombinant Protein Production and Purification-pGEX2T/N-finger (200 -254), pGEX2T/C-finger (249 -318) and derivatives were transformed into Escherichia coli BL21 and the resulting bacteria grown on Luria Bertani broth at 37°C. Protein induction was initiated by the addition of isopropyl-␤-D-thiogalactoside to a final concentration of 0.1 mM. After 4 h of induction, the bacteria were collected by centrifugation, and the pellet was subjected to sonication in buffer containing 50 mM Tris, pH 8.0, 50 mM NaCl, 1% Triton X-100, 1.4 mM phenylmethylsulfonyl fluoride, 1.4 mM ␤-mercaptoethanol. The soluble material was eluted from a glutathione-agarose column. After washing with buffer containing 50 mM Tris, pH 8.0, 100 mM NaCl, 10% glycerol, 1.4 mM phenylmethylsulfonyl fluoride, 1.4 mM ␤-mercaptoethanol, the GST fusion protein was eluted with freshly prepared reduced glutathione (6 mg/ml) in 100 mM Tris, pH 7.5, 120 mM NaCl.
Electrophoretic Mobility Shift Assays-These experiments were performed as previously described (11). Reactions were set up in a total volume of 30 l, comprising 0.1 pg of 32 P-labeled probe, 10 mM Hepes, pH 7.8, 50 mM KCl, 5 mM MgCl 2 , 1 mM EDTA, 5% glycerol. After the addition of between 100 and 500 ng of recombinant protein, the reaction was kept on ice for 20 min and then loaded onto a 6% native polyacrylamide gel made up in 0.5ϫ Tris borate-EDTA. The gel was then subjected to electrophoresis at 15 V/cm at 4°C for 2 h, dried, and analyzed using a PhosophorImager (Molecular Dynamics). The probes used in the experiments were end-labeled using polynucleotide kinase as previously described (15).
Cell Culture and Transient Transfection Assays-All cell culture manipulations were carried out using standard techniques. Briefly, NIH3T3 cells were co-transfected with 2 g of the reporter construct, and 100 ng or 250 ng of the wild-type and mutant GATA-1 expression vectors using the calcium phosphate method (15). The data are the result of six independent experiments and have been normalized to Renilla luciferase levels derived from co-transfection with the control vector pRL-CMV (Promega).

RESULTS
The Murine GATA-1 N-finger Is Capable of Sequence-specific DNA Recognition-We first prepared recombinant N-finger and C-finger protein as fusions with GST. We also prepared two negative control variants: mutN-finger, carrying a cysteine to alanine substitution at residue 204; and mutC-finger, carrying the equivalent substitution at residue 258. Cysteines 204 and 258 are required for chelation of the zinc ion (in the N-and C-finger respectively), so these mutations prevent the proper folding around zinc. See Figure 1 for the sequences of the two domains, details of numbering and an illustration of residues involved in zinc co-ordination, DNA-contact, the interaction with FOG, and substitutions associated with naturally occurring mutations in GATA-1.
The four proteins and purified GST alone were tested for their ability to bind a typical (A/T)GATA(A/G) motif, in this case the TGATAA motif from the mouse ␣-globin promoter (4,5). As shown in Fig. 2A, the C-finger protein recognizes the TGATAA site as expected, but the N-finger does not. The C258A mutation in the C-finger that interferes with zinc finger formation prevents binding. Given that the N-finger of chicken GATA-2 (but not chicken GATA-1) has been shown to bind variant GATA sequences (8) we tested a panel of GATA-related sequences and found binding to a TGATCT probe. Fig. 2B shows binding to this variant TGATCT probe. In this case, both the N-finger and the C-finger protein bind this site but binding is not observed with the mutant N-finger or C-finger proteins, or by GST alone. Moreover, sequence specificity is demonstrated by the fact that the N-finger protein binds the TGATCT but not the TGATAA site (compare Fig. 2, A and B). The protein preparations were also examined by Coomassie staining to ensure that comparable amounts were loaded (Fig. 2C).
The Binding Specificity of the N-finger Is Distinct from That of the C-finger-We next explored the sequence of the N-finger by testing variants of the original TGATCT site. Because the DNA-binding domains of GATA proteins from different organisms are highly conserved and appear to require the central GAT core for recognition (2,16,17), we kept these three bases constant but tested variant sites with each of the surrounding bases changed to each of the three other possible bases (Fig. 3). We tested these new sites for binding to the N-finger and for comparison to the C-finger protein. As can be seen in Fig. 3, the N-finger binds the TGATCT site as expected but can also bind related sites such as AGATCT, CGATCT, and GGATCT sites. GGATCG and GGATCA sites are also bound. The other sites are recognized weakly, if at all. Coomassie Blue staining indicates that equal amounts of protein were used in these experiments (Fig. 3B). In summary, the N-finger appears to bind sites with a GATC core. In contrast, the C-finger binds several sites but as expected binds preferentially to the canonical TGATAA site. When combined with previous data on DNA binding by the GATA-1 C-finger (5,7,16,17), our results demonstrate that the N-and C-fingers have different preferred sites, with the N-finger favoring a GATC core and the C-finger binding more tightly to probes with a GATA core.
GATA-1 Can Activate Transcription from Promoters Containing GATC Elements-The above results show that the isolated GATA-1 N-finger domain is capable of sequence-specific DNA binding in vitro. We next sought to test whether this domain could function in the context of the whole protein and whether it had the ability to bind to GATC motifs and activate transcription in living cells. GATA-1 has been shown to function as a potent transcriptional activator when tested in transient transfection experiments in NIH3T3 cells and is often assayed against synthetic promoters comprising one or several GATA sites upstream of the ␤-globin TATA box and a suitable reporter gene (5). We constructed two reporter constructs, the first containing tandem copies of the TGATCT (or N-finger recognition element) and the second containing tandem copies of the TGATAA (or C-finger recognition element) upstream of the ␤-globin TATA box and a luciferase reporter gene. We first tested the ability of wild-type GATA-1 to activate these promoters and observed activation of both the promoters (Fig. 4, A  and B). This result demonstrated that GATA-1 was being localized to both recognition sites and was activating transcription in vivo. We have also confirmed that the double finger domain, comprising intact N-and C-fingers, is capable of binding both the N-finger (TGATCT) site, and the C-finger (TGATAA) site in vitro (data not shown). To test whether the ability of GATA-1 to bind the different recognition sites depended on the N-finger or the C-finger, we next tested the two mutant GATA-1 constructs. The first protein contained an intact N-finger but carried the C258A mutation that disrupts the C-finger, and the second contained an intact C-finger but a mutant N-finger (C204A). As shown by comparing panels A and B in Fig. 4, the protein with an intact N-finger can transactivate the promoter containing the TGATCT (N-finger recognition) site, but not the promoter containing the typical TGATAA (C-finger site). In contrast, the protein with the intact C-finger can activate both promoters. This result is consistent with the observation that the C-finger is capable of binding to both typical C-finger (TGATAA) and N-finger (TGATCT) sites (Fig. 3).
We examined the control regions of a number of putative GATA-1 target genes and identified GATC elements in the erythroid L-type pyruvate kinase (L-PK) enhancer, in the erythroid ALAS2 proximal promoter, and in the promoter of the megakaryocyte-expressed gene platelet factor 4 (PF4). We first used gel retardation assays to test whether the GST-N-finger protein could recognize these sites (Fig. 5). We also tested the two promoters in transient transfection assays. As shown in Fig. 6, both promoters were activated by GATA-1, and importantly both were activated by the GATA-1 mutant that contains a mutation disrupting the C-finger. Remarkably, these two promoters are preferentially activated by GATA-1 molecules that contained a mutation in the C-finger so that binding by the N-finger was obligatory. Our previous results indicate that either the N-finger or the C-finger can bind GATC sites in DNA (Fig. 3). It appears that GATA-1 activation is more potent when the C-finger is disrupted, and GATA-1 is tethered to the DNA exclusively by its N-finger.
We carried out an additional experiment to confirm that the N-and C-fingers were behaving as expected in these assays. It has previously been shown that the GATA-1 cofactor FOG can repress GATA-mediated transactivation but only if its contact domain, i.e. the N-finger, is intact (18). We therefore tested the effect of titrating in increasing amounts of FOG (Fig. 7). As expected FOG repressed wild type GATA-1. It also repressed the GATA-1 mutant with an intact N-finger, but importantly it did not repress the GATA-1 derivative with a mutant N-finger. Taken together these results indicate that FOG can bind the N-finger when it is itself bound to a GATC site on DNA.
Naturally Occurring Mutations in the N-finger of GATA-1 and the Effects on Its DNA Binding Specificity-The GATA-1 gene lies on the X-chromosome and recently several naturally occurring mutations in this gene have been shown to be associated with genetic disorders (10, 19 -21). Four mutations have been described: V205M is associated with severe anemia and thrombocytopenia, G208S with thrombocytopenia, R216Q with the ␤-thalassemia trait and thrombocytopenia, and D218G with thromobocytopenia. We prepared GST-N-finger fusion proteins carrying these substitutions and tested them for their ability to bind GATC sites in DNA (Fig. 8A). The R216Q mutation abolished recognition of GATC sites by the N-finger domain, whereas the other mutations had only minor effects. A Coomassie-stained gel showed that equivalent amounts of pro- tein were used in this experiment (Fig. 8B). Our results are consistent with the view that the V205M, G208S, and D218G mutations act primarily by disrupting protein-protein interac-tions with the cofactor FOG (19 -21) and the conclusion that the R216Q substitution affects DNA-binding by the N-finger (10). DISCUSSION We show here that the N-finger of GATA-1 can recognize GATC motifs in DNA both in vitro and in cellular assays. This represents a new role for the N-finger, a domain that has primarily been recognized as a region involved in binding to accessory proteins, such as FOG (6). It has, however, been reported that the N-finger can play a subsidiary role in enhancing the stability of DNA-binding and in the recognition of particular double GATA motifs (5,9). The realization that the N-finger is a genuine DNA-binding domain is consistent with the view that it can significantly influence the overall DNA binding properties of GATA-1 (9).
In addition to the six known mammalian GATA family proteins, there are related proteins in other vertebrates and these also contain both an N-finger and C-finger domain. The chicken GATA-1 protein has been extensively studied but independent DNA binding by its N-finger domain has not been detected (8,9). In contrast, the N-finger domains of the related proteins chicken GATA-2 and GATA-3 have been shown to bind DNA (8). In these instances, DNA binding activity depended on the presence of a stretch of basic residues immediately N-terminal to the N-finger. This stretch does not occur in the murine GATA-1 N-finger, indicating that it is not an essential feature in all instances. We have also studied the human GATA-1 N-finger and the Drosophila Pannier/dGATAa N-finger, both of which also lack the upstream basic stretch and have detected DNA binding activity in both cases (data not shown). These results and the remarkable conservation of putative DNAcontact residues in the N-fingers (Refs. 2, 3 and 7 and Figs. 1 and 9) suggest that many GATA N-fingers may have the ability to bind DNA to some extent, but since the binding is relatively  (7), adapted to show the three residues Leu-268, Asn-280, and Thr-267 (pink, numbering relative to the murine protein), which make direct contact with the GAT core (yellow). B, a similarly orientated model of the N-finger is presented with the FOG contact residues shown in red and the residues altered by mutation in individuals affected by familial anaemia and thrombocytopenia (V205, G208, R216, and D218) labeled. weak, detection of independent binding may depend on the biophysical properties of the particular proteins and the precise assay conditions used.
Our results suggest that GATA-1 can bind DNA using either the N-or the C-finger. The in vitro binding experiments suggest that it can bind GATC sites using either the N-finger or the C-finger, and that it can bind (T/A)GATA(A/G) motifs using the C-finger. GATA-1 also recognizes double GATA-motifs in a small subset of promoters. In these cases GATA-1 presumably binds using both fingers (9). Thus the configuration of GATA-1 on different promoters may vary: it may bind single typical TGATAA sites with the C-finger, GATC sites with either the Nor C-finger and double GATA sites with both fingers. This realization is interesting given that the activity of GATA-1 has been found to vary in different promoter contexts. In some cases, it can act as a strong activator of transcription, but on other promoters it has been found to have little activity, and on others it can repress gene expression (1,9,22,23). In our experiments GATA-1 functioned as an activator regardless of whether it was binding through the N-finger or the C-finger but interestingly it activated GATC site-dependent promoters more strongly when the C-finger was mutated and hence binding by the N-finger was obligatory (Fig. 7).
It is possible that the different topology of GATA-1 at the promoter influences its interactions with cofactors and its ultimate transcriptional output. In this context it is notable that both zinc fingers of GATA-1 have been demonstrated to interact with a number of other transcription factors, including Sp1, PU.1, and GATA-1 itself, and transcriptional co-regulators such as FOG, CBP, and E-RC1 (6, 24 -30). The binding of the two fingers to GATA and GATC motifs in DNA may limit the range of interactions that the protein can make with cofactors, and thus the precise promoter configuration will be critical to its activity.
The best characterized cofactor of GATA-1 is the zinc finger protein FOG (6), which recognizes a conserved face on the N-finger domain (Figs. 1 and 9). The key contact residues are not conserved in the C-finger and accordingly FOG does not bind this domain (Refs. 6 and 11 and Fig. 1). The N-finger domains of several other mammalian GATA proteins, including GATA-2, -3, and -4 also contain these contact residues, and they have been shown to bind FOG (6), or a related more broadly expressed protein, FOG-2 (31)(32)(33)(34). Interestingly, a GATA protein from Drosophila, termed Pannier or dGATAa, also interacts with a FOG family protein, U-shaped (35). This result suggests that GATA and FOG proteins share a long evolutionary association.
The recently described V205M, G208S, R216Q, and D218G mutations in individuals suffering genetic anemia and/or thrombocytopenia attest to the importance of the N-finger domain in vivo (10, 19 -21). It has been shown that the V205M, G208S, and D218G mutations all interfere with the interaction between FOG and GATA-1. We have shown here that these mutations do not significantly affect DNA-binding by the Nfinger (although the gel retardation technique is only semiquantitative and minor effects on DNA binding cannot be ruled out). Nevertheless this result supports the view that inhibition of cofactor binding rather than DNA binding is the major effect of these mutations. On the other hand the R216Q mutation abolishes the recognition of GATC sites by this domain but does not significantly impair binding to FOG (data not shown). The pathology observed in the individual carrying the R216Q mutation (thrombocytopenia and ␤-thalassemia trait) supports the view that the DNA binding activity of the N-finger domain, either its direct recognition of GATC sites or its contribution to the recognition of double GATA sites (8 -10), is required for normal hematopoiesis in vivo.
Given that the N-finger of GATA-1 can bind DNA, one can compare it to the C-finger on the basis of the known DNA binding surface on this finger (7). One can deduce that the V205M, G208S, and D218G mutations lie away from the face that binds DNA (Fig. 9). This result is consistent with the fact that these mutations have no major effect on DNA binding. Moreover, the result confirms previous suggestions that FOG associates with DNA-bound GATA-1 and binds the exposed surface of the N-finger that is not committed to contacting DNA (36 -38). The residue Arg-216, on the other hand, corresponds with a residue in the C-finger that is involved in direct contact with DNA (Fig. 1), thus it is not unexpected that it has a profound effect on the binding of the N-finger to GATC sites observed here.
In the same way that the different abilities of the N-finger and the C-finger bind FOG can be attributed to minor but important differences in their sequences, the differing DNA binding specificities of the two zinc fingers must also reflect sequence differences. The structure of the C-finger bound to an AGATAA site reveals that particular residues, including Leu-268, Asn-280, and Thr-267 (numbering from Fig. 1) are important for recognizing the GAT core (Ref. 7, Fig. 9). Further structural analysis of the N-finger of GATA-1 bound to DNA should illuminate the precise mechanisms underlying the DNA binding specificity of GATA fingers and possibly indicate whether the construction of novel derivatives with varied specificity is feasible.