Alternate Promoters and Developmental Modulation of Expression of the Chicken GATA-2 Gene in Hematopoietic Progenitor Cells*

We have isolated and characterized the chicken GATA-2 ( cGATA-2 ) gene. We show that, as in the case of some other members of the GATA gene family, the gene is expressed from alternative first exons. One of the resulting mRNAs represents only a minor form of the GATA-2 mRNA in the cells and tissues we analyzed; the other is ubiquitously expressed. We have defined the minimal promoter that controls expression of this most abundant mRNA and that is necessary for full activity in hematopoietic progenitor cells. The activity of this promoter in transient assays is consistent with developmental differences of expression levels in these cells. We identify within the promoter a previously unrecognized extended CCAAT motif essential for its activity. The organization of the cGATA-2 gene, with alternative first exons and a CCAAT box in the proximal promoter, is similar to that recently described for mouse GATA-2 , and the proximal promoter also resembles the only promoter so far described in Xenopus . Nonetheless, the roles of the promoters in development and tissue-spe-cific expression are quite different in these organisms, most strikingly in the mouse, which assigns developmental roles to its proximal and distal promoters that are quite different from those in the chicken. We suggest that although the overall organization may remain the same, the role assigned to each promoter varies among organisms. We identify distant upstream regulatory elements in the cGATA-2 gene that modulate expression from the proximal promoter and that may be responsible for this variation.

The GATA family of transcription factors plays a wide range of roles in development. Three members of the family are absolutely required for normal hematopoietic development in the mouse: GATA-1 (1), GATA-2 (2), and GATA-3 (3). The hallmarks of these DNA-binding proteins, of which six have been identified so far in vertebrates, is a highly conserved (C4) zinc finger domain and a recognition of the consensus motif (T/A)GATA(A/G) in the DNA (4). However, binding sites which differ from the strict consensus originally defined are also recognized in vivo by GATA proteins (5,6), as was already suspected from in vitro analysis (7)(8)(9)(10).
The GATA transcription factors exhibit a distinct, although partially overlapping, tissue distribution and developmental expression profile. GATA-2 is expressed in a wide variety of tissues, which include hematopoietic progenitors, erythroid cells, mast cells, megakaryocytes, endothelial cells, the central nervous system, and the giant cells of the trophoblast (11)(12)(13)(14)(15)(16)(17)(18)(19)(20). Several observations suggest that GATA-2 plays a fundamental role in hematopoietic development. In the mouse, disruption of the gene by gene targeting via homologous recombination in mouse embryonic stem cells leads to the death of embryos homozygous for the deletion. They die approximately at embryonic day 10 -11 with severe anemia. In adult chimeric mice, GATA-2-deficient embryonic stem cells do not give rise to cells of any hematopoietic lineages (2). During erythroid maturation, GATA-2 mRNA is down-regulated (11,21), whereas ectopic overexpression in chicken erythroid progenitors of GATA-2, but not GATA-1 and GATA-3, promotes their proliferation at the expense of their differentiation (22). In Xenopus and zebrafish, GATA-2 is expressed within the presumptive blood island of the embryo (16,18,23). Taken together, these data indicate that GATA-2 has a critical role in early hematopoietic cells, possibly influencing the maintenance or the proliferation of the progenitors. Transcriptional regulation of the GATA-2 gene has been studied in Xenopus (18,24,25), the zebrafish (26), and the mouse (27). The genes have some regulatory features in common, but they also differ in important respects. In our laboratory, we have been interested in the regulation of erythroidspecific genes in the chicken, and particularly in the differential control of and activity of individual GATA family members. Because the solutions to regulatory problems found in the chicken often differ in illuminating ways from those found in other organisms, and because we study many GATAdependent mechanisms in chicken, we cloned the chicken GATA-2 (cGATA-2) 1 gene.
Earlier studies of cGATA-2 cDNA had suggested that the gene was transcribed from a single promoter far upstream of the coding region. We find, however, that cGATA-2 expression is controlled by both a proximal and a distal promoters. We show that only a minor transcript initiates from the upstream promoter, the proximal one accounting for most of the cGATA-2 transcription. The arrangement of the promoters is similar to that recently described for mouse GATA-2, but the promoter usage is entirely different, particularly in hematopoietic lineages. The proximal chicken promoter, responsible for expression of the principal transcript in most cells, shares with the other species a CCAAT box that is essential for expression. We identify a previously uncharacterized highly conserved sequence motif located downstream of the core CCAAT element * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF038592.
§ To whom correspondence should be addressed: Laboratory of Molecular Biology, Bldg. 5, NIDDK, National Institutes of Health, Bethesda, MD 20892-0540. that plays a role in the affinity of the CCAAT protein for its binding site. Thus similar motifs are employed for quite different developmental tasks in chicken than they are in mouse. We show that upstream elements in cGATA-2 inhibit expression in a cell type-specific manner and suggest that they may contribute to this difference.

EXPERIMENTAL PROCEDURES
Isolation and Characterization of the Chicken GATA-2 Gene-A EMBL3 genomic library derived from chicken erythrocyte DNA (29) was screened with a cGATA-1 cDNA fragment spanning the finger region, using low stringency washing conditions. Two overlapping clones were purified to homogeneity through plaque hybridization screening. However, the inserts of these two clones hybridized only to 3Ј fragments of the cGATA-2 cDNA (11). As further screens of this library using oligo probes failed to identify any clones likely to contain the 5Ј missing sequences, a partial chicken genomic library was constructed in EMBL4 (Stratagene), with chicken erythrocyte DNA completely digested with EcoRI. Indeed, this restriction enzyme was shown by hybridization of a Southern blot with cGATA-2 cDNA-specific probe to generate a 17-kb genomic fragment containing the 5Ј part of the cDNA missing in the two phages purified from the first screening. The screening of this library allowed us to isolate several phages containing the expected insert. One of them, called G2.5, was characterized in greater detail. It overlaps over 6.5 kb with G2.611, the largest of the two clones isolated from the first screening. The position of each exon was assigned using data obtained from restriction mapping of the cDNA and the genomic clones and from exon-specific oligonucleotide hybridization. Delineation of exon-intron sequence boundaries was accomplished by DNA sequencing. The lengths of the intervening sequences were ascertained from the sizes of polymerase chain reaction products obtained using appropriate primers located in adjacent exons. Together, the clones G2.611 and G2.5 contained all the sequences present in the chicken GATA-2 cDNA. In order to clone more genomic DNA upstream of exon 1a, which was originally supposed to contain the regulatory sequences of the gene, a new partial genomic library was constructed in the plasmid pBluescript SKϩ (Stratagene), after a double digestion of the genomic DNA with XhoI and EcoRV. These enzymes generate a 3.6-kb fragment which overlaps over 1 kb with the 17-kb EcoRI fragment. Several clones containing the expected insert were isolated, and one of them, pG2.2, was analyzed in more detail.
DNA sequencing was performed using the Thermo Sequenase system (Amersham Pharmacia Biotech).
RNA Preparations and Analysis-Total RNA was prepared using RNA STAT-60 solution (TEL-TEST) according to the manufacturer's instructions and was treated with DNase I/RNase-free (Boehringer Mannheim). RNA integrity of each preparations was checked by the presence of undegraded 17S and 28S RNA species in ethidium bromide stained 1% agarose/MOPS-formaldehyde gels (data not shown). RNA was prepared from HD24, 6C2, HD37, and DT40 chicken cell lines. RNA was also prepared from circulating primitive red blood cells from 5-day embryo (isolated according to Mahoney et al. (48)), circulating definitive red blood cells of 10-day embryo, chicken fibroblasts (prepared by trypsin digestion of 10-day embryo (49)), brain (cerebral hemisphere) and liver of 10-day embryo, as well as from whole decapitated 10-day embryos. Embryonated White Leghorn chicken eggs were obtained from Truslow Farms (Chestertown, MD).
RNA was analyzed by RNase protection assay using the Ambion RPA II kit. A 392-bp fragment encompassing nucleotides 1743-2135 of the cGATA-2 cDNA (11), which is located in exon 6, was inserted in pBluescript SKϩ (Stratagene) to generate the exon 6 probe. Radiolabeled antisense RNA was generated by in vitro transcription using T3 polymerase (Stratagene) and [␣-32 P]CTP. A 175-bp fragment encompassing nucleotides 22-197 of the cGATA-2 cDNA, which map to exon 1a, was cloned in pCR II plasmid (Invitrogen) to generate the exon 1 probe. Radiolabeled antisense transcripts were generated using SP6 polymerase (Stratagene) and [␣-32 P]CTP. A 329-bp fragment spanning nucleotides Ϫ170 to ϩ160 (see Fig. 5A) was cloned in pBluescript SKϩ to generate the exon 1b probe. Radiolabeled antisense RNA was generated using T7 polymerase (Stratagene) and [␣-32 P]P CTP. All RNA probes were acrylamide gel-purified prior to hybridization. 30 g of RNA were denatured for 6 min in boiling water prior to hybridization to the exon 1 or exon 1b probes, and hybridizations were performed overnight at 65°C. 15 g of RNA were used with the exon 6 probe. In this case, hybridization was performed at 45°C. In order to make the band intensities (shown in Figs. 2 and 3) approximately comparable, exposure times for the two autoradiograms were chosen to compensate for differences in probe size and specific activity.
Characterization of the DNase I Hypersensitive Sites-The nuclei preparations were performed on ice until DNase I digestion. Approximately 1 ϫ 10 8 cells were washed once in cold phosphate-buffered saline. Cells were lysed in 5 ml of Buffer A (10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 3 mM MgCl 2 , 0.2% Nonidet P-40, 10 mM dithiothreitol, and 0.5 mM EGTA) for 5 min. Nuclei were pelleted for 5 min at 1000 ϫ g and washed once in Buffer A without Nonidet P-40. The pellet was resuspended in 1 ml of Buffer B (10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 3 mM MgCl 2 , 1 mM CaCl 2 ) at room temperature, and 100 l of the nuclei preparation were used for each DNase I (Worthington, catalog designation DPFF) digestion. The final DNase I concentrations were 0, 0.5, 1.5, 4.5, 13, 40, and 120 units/ml. Digestions were performed at room temperature for 5 min and stopped by addition of 400 l of Buffer C (10 mM Tris, pH 8, 12 mM EDTA, 0.625% SDS) plus 10 g of RNase A. After 30 min of incubation at 37°C, 150 g of proteinase K were added and incubation was performed overnight at 55°C. The samples were next extracted with phenol-chloroform and chloroform and precipitated. For mapping of the hypersensitive (HS) 1 site, 10 g of DNase I treated DNA were digested with BamHI and then subjected to electrophoresis through 1.6% agarose. Southern blots were prepared on GeneScreen Plus membrane (NEN Life Science Products). Hybridization was performed using Quickhyb solution (Stratagene). The probe used was a SalI-BamHI restriction of the cGATA-2 gene, labeled by random priming. For the mapping of HS2, 10 g of DNase I treated DNA were digested with BglII and SspI and then subjected to electrophoresis through 1.2% agarose. Southern blots and hybridization were performed as above, using in this case a BglII-XbaI fragment as a probe.
Cell Lines-HD24 cells are chicken multipotent erythroid-myeloid cells transformed by the E26 virus. They cannot differentiate as efficiently as primary transformants and express some markers for early erythroid progenitors. They were grown in blastoderm media (35). 6C2 and HD37 are CFU-E stage erythroid precursor cells. The HD37 line was generated by infection of two day embryonic blastoderm with a mutant of E26 (35), and the 6C2 line was obtained by transformation of bone marrow with wild type avian erythroblastosis virus. HD37 cells were grown in blastoderm media (35). 6C2 cells were grown in ␣-minimum Eagle's medium supplemented with 10% fetal bovine serum, 2% chicken serum, 1 mM Hepes, 50 mM ␤-mercaptoethanol, and a standard complement of antibiotics. DT40 cells, purchased through American Type Culture Collection (Manassas, VA), were grown in Dulbecco's modified Eagle's medium supplemented with 50 mM ␤-mercaptoethanol, 2 mM glutamine, 10% fetal bovine serum, 5% chicken serum, 10% tryptose phosphate broth, and antibiotics. All of the cells were maintained at 37°C in 5% CO 2 .
Transfections-Approximately 3.2 ϫ 10 7 cells per sample were washed twice in phosphate-buffered saline and resuspended in 2.1 ml of Opti-MEM I (Life Technologies, Inc.). 300 ml of Opti-MEM I containing 50 ml of LipofectAMINE (Life Technologies, Inc.) and plasmid DNA were added to the cells. 1 g of RSV-CAT and 3 g of test plasmid (or an equivalent copy number) containing the luciferase reporter gene were used in each transfection. Cells were incubated at 37°C for 5 h in the presence of the transfection mix, returned to normal media, and incubated for 48 h. For assays, cells were harvested and washed twice in phosphate-buffered saline and resuspended in 150 l of reporter lysis buffer (Promega). The supernatants were assayed for luciferase activity using the Promega luciferase assay system according to the manufacturer's instructions, and for chloramphenicol acetyltransferase (CAT) activity, using a liquid scintillation method (50). The luciferase activity was normalized to the CAT activity detected in each samples. The values presented are the mean of at least three independent experiments performed each time in duplicate, using different preparations of plasmid DNA.
Construction of GATA-2 Gene Derivatives-A 5.3-kb XhoI-SalI insert derived from the genomic phage G2.5 DNA fragment and containing the 5Ј-end of exon 2, the entire exon 1b, and the sequence upstream of it was subcloned into the unique XhoI site of the pBluescript SK ϩ plasmid. This fragment was then deleted using Bal-31 nuclease (New England Biolabs) from a unique HindIII site located in intron 1b. After treatment with mung bean nuclease (New England Biolabs) and ligation of an adaptor to recreate a HindIII restriction site, plasmids were recircularized and sequenced. One resulting cGATA-2 fragment retaining the first 118 bp of exon 1b and the intact 4.9 kb of 5Ј-flanking sequence was selected. It was inserted between the XhoI and HindIII sites of the pGL3-basic (Promega) to give the Ϫ4900 LUC construct. The EcoRV, BamHI, and SmaI restriction sites contained in the fragment were also used to generate respectively the constructs Ϫ1900 LUC, Ϫ580 LUC, and Ϫ160 LUC, using SmaI or BglII plus HindIII sites in pGL3-basic for the cloning. The Ϫ1900/Ϫ580 fragment was obtained by an EcoRV-BamHI digestion of the parental fragment, which was then cloned in the pGL3-basic in SmaI-BglII.
The mCCAAT LUC, m9 LUC, and mCCAATϩm9 LUC mutants were obtained by replacing the SmaI-BssHII fragment (position Ϫ160 to Ϫ77 in the proximal promoter (see Fig. 5A)) of the Ϫ160 LUC construct, by a double-strand oligonucleotide with the correct restriction sites at its extremities and the sequence of interest mutated. The same mutations of the CCAAT box and the conserved region as the one used in electophoresis mobility shift assay were introduced in these oligonucleotides.
Protein Extracts-Nuclear extracts were prepared as described (46). Leupeptin (0.5 g/ml), pepstatin A (0.7 g/ml), and phenylmethylsulfonyl fluoride (0.1 M) were added in all the buffers. The integrity of the extracts was tested by mobility shift assays with oligonucleotides containing an Sp1 or a GATA-1 binding site (data not shown).
Gel Retardation Assay-All of the oligonucleotides used in gel retardation assay were synthesized on an Applied Biosystems Synthesizer and gel purified. Single-strand oligonucleotides were 5Ј-end-labeled with [␥-32 P]ATP and then annealed to their complement present in 1.2-fold molar excess. 13 g of nuclear protein extracts were incubated for 15 min on ice with 0.02 pmol of labeled DNA probe, 12 mM Hepes, pH 7.9, 80 mM KCl, 1.5 mM MgCl 2 , 2% glycerol, 0.6 mM dithiothreitol, and 2 g of poly(dI-dC). Competitions were performed by adding a molar excess of cold double-stranded oligonucleotide in the reaction before addition of the nuclear extract. Samples were electrophoresed on an 6% acrylamide gel (ratio, 29:1), 0.25ϫ TBE at 200 volts and room temperature. Gels were dried prior to autoradiography.

RESULTS
Structure of the Chicken GATA-2 Gene-We isolated two overlapping genomic clones (see under "Experimental Procedures" and Fig. 1A) together spanning the entire cGATA-2 cDNA previously isolated by Yamamoto et al. (11). A third clone (pG2.2) suspected to contain some of the regulatory sequences of the gene was also analyzed. The organization of the gene appeared at first very similar to that of the human and the Xenopus GATA-2 genes (17,24) and therefore to that of the GATA-1 and GATA-3 genes cloned from different species (13, 29 -32). As shown in Fig. 1A, the cGATA-2 gene is composed of six exons distributed over 16 kb of DNA. All exon-intron boundaries conform to the GT-AG rule (Fig. 1B). The first exon is noncoding, and the methionine codon for translational start is found in exon 2. The two highly conserved zinc finger domains are encoded separately in exons 4 and 5, the intron/exon boundaries of which are highly conserved among all characterized family members (13, 17, 24, 29 -32). In the case of the chicken gene, exons 1 and 2 are separated by a relatively large intron (6.5 kb) compared with what is found in the human (0.8 kb) (17) or the Xenopus (0.25 kb) (24) GATA-2 gene.
A GATA-2 cDNA variant, GATA-2Ј, containing an additional in-frame 33 nucleotides immediately 5Ј to the N-terminal zinc finger motif, has been described in both chicken and Xenopus (14,21,24,34). This alternatively spliced form results in a protein with 11 extra amino acids, which are highly conserved between the two species. The additional sequence within GATA-2Ј represents a 5Ј extension of exon 4 that encodes the N-terminal zinc finger (Fig. 1B). Thus the differential use of two splice acceptor sites upstream of exon 4 leads to the expression of GATA-2 and GATA-2Ј.
Exon 1 Is Differentially Transcribed-We next identified the promoter of the cGATA-2 gene in chicken hematopoietic progenitor cells. We first attempted to delineate the 5Ј-end of the cGATA-2 mRNA. Comparison of the length of the published cDNA with the mRNA detected by Northern blot (11) indicated that about 1.5 kb of transcribed sequences remained to be identified. These yet unknown sequences could be located 5Ј and/or 3Ј of the mRNA, as no consensus polyadenylation signal is present in the cDNA. We used an RNase protection assay to analyze this message. Two radiolabeled probes were generated: one of 395 bp containing part of exon 6 and one of 180 bp containing the entire known sequence of the first exon ( Fig. 2 and see under "Experimental Procedures"). These probes were annealed with total RNA extracted from HD24, HD37, and 6C2 cells. HD24 cells are chicken multipotent erythroid-myeloid cells; HD37 and 6C2 are stable chicken CFU-E stage erythroid precursor cells. We also used RNA prepared from primitive and definitive red blood cells and a variety of other cells and tissues. Finally, we used RNA from DT40 cells, a chicken lymphoblastoid cell line, as a negative control for the transcription of the gene. As shown in Fig. 2, the exon 6 probe protected a fragment of FIG. 2. Differential transcription of the cGATA-2 first exon. An RNase protection experiment was performed using two riboprobes: one spanning the 180 bp of the first exon and one of 395 bp containing part of exon 6, as indicated. These probes were hybridized with total RNA extracted from 10-day decapitated total embryo, DT40 cells, 5-day primitive red blood cells (RBC), HD24 cells, 6C2 cells, HD37 cells, 10-day definitive red blood cells, 10-day brain, 10-day fibroblasts, and 10-day liver. The characteristics of the cell lines are described in the text. To correct for the incorporation of 2-fold less radioactive nucleotides in the exon 1 probe compared with exon 6, we used 30 g of total RNA for hybridization with the exon 1 probe and 15 g with exon 6. The autoradiogram presented with the exon 6 probe was obtained after 1 day of exposure at Ϫ80°C, whereas with the exon 1 probe, 8 days of exposure under the same conditions were necessary to obtain the results presented. Because of this low intensity of the exon 1-protected fragment, we cannot conclude anything about the transcription of the first exon in liver and fibroblasts, as the cGATA-2 mRNA is not very abundant in these tissues. The absence of the exon 1-protected fragment in 10-day red blood cells indicates that the small signal detected in the brain is specific and not due to a contamination of the cerebral tissue by blood. FIG. 3. Identification of an alternative first exon in the cGATA-2 gene. An RNase protection experiment was performed using a 359-bp riboprobe containing 329 bp of the cGATA-2 gene and 30 bp of plasmid DNA. The riboprobe contains the sequence starting immediately 3Ј of intron 1b and spans the 329 bp upstream. The same RNAs as described previously were used in this experiment. A major transcript giving rise to a protected fragment of around 160 bp was detected in the samples; we called it exon 1b. Larger protected fragments were also detected.
the expected size with RNA of all the tissues and cells tested except from DT40 cells, in which the transcript was absent. These results are in agreement with the known tissue distribution of the cGATA-2 mRNA (11,14). The variable signal indicates that the proportion of the cGATA-2 transcript is different from one tissue to another, with a major abundance in the erythroid progenitor cells HD24, 6C2, and HD37 and in primitive red blood cells. On a cell number basis, the cGATA-2 mRNA is twice as abundant in 6C2 as in the HD24 cells and is almost undetectable in 10-day red blood cells (data not shown); this is in agreement with the down-regulation of the gene during erythroid differentiation.
To our surprise, the results obtained with the upstream exon 1 probe were clearly different from those obtained with the exon 6 probe. Indeed, a protected fragment of a size consistent with presence of exon 1 sequences was detected only in 6C2, HD37, primitive red blood cells, and 10-day brain, and then at much lower abundance than found for exon 6 (see legend of Fig.  2 and under "Experimental Procedures"). No exon 1-containing mRNA whatever was detected in HD24 and 10-day definitive red blood cells, although these cells clearly express the cGATA-2 gene.
These data show that the cGATA-2 mRNA corresponding to the cloned cDNA is not the major form transcribed in the cells and tissues expressing GATA-2 that we tested. Most importantly, transcripts containing this exon are specifically restricted to only some of these cells and tissues and are not well represented in the total GATA-2 transcript population in erythroid cells. This signaled the existence of cGATA-2 mRNA with an alternative first exon and, thus, the existence of more than one promoter driving the transcription of the chicken gene. We focused our attention on identifying this alternative first exon.
Identification of an Alternative First Exon for the cGATA-2 Gene-We first took advantage of a 5Ј-RACE clone (provided by Dr. Todd Evans) that had been made using poly(A) ϩ RNA isolated from primitive red blood cells and a cGATA-2-specific oligonucleotide hybridizing in exon 2. This clone contained not only the entire expected sequence of exon 2 but also a previously unidentified sequence that matched a portion of our genomic sequence (in G2.5). Although this 5Ј-RACE clone did not contain the entire message sequence, it allowed us to identify the 3Ј-end of a new exon, which we called exon 1b, lying 403 bp upstream of exon 2 ( Fig. 3; see also Fig. 5A). The first exon present in the cDNA cloned by Yamamoto et al. (11) was therefore renamed exon 1a. The 403 bp interval represents intron 1b. This newly identified exon-intron boundary conforms to the GT-AG rule. Exon 1b, like exon 1a, is noncoding.
In order to locate the 5Ј-end of this new exon, as well as to analyze the tissue distribution and the abundance of the exon 1b-containing mRNA, we used a 329-bp probe, including the entire 3Ј-end of exon 1b and the sequence located upstream, and the same RNAs as described previously, in an RNase protection assay (Fig. 3). Several probe fragments were protected with all the RNA tested, with the exception, as expected, of the DT40 RNA. A major transcript giving rise to a fragment of about 160 bp was observed in erythroid precursor cells HD24, 6C2, and HD37, as well as in primitive red blood cells. The abundance of the fragment was similar to that detected with the exon 6 probe (Fig. 2, see under "Experimental Procedures"), suggesting that most of the cGATA-2 mRNA tran- The parental BamHI band is 2250 bp. The appearance of a hypersensitive band of 1700 bp, marked by arrows on the right of the gels, indicates the presence of a hypersensitive site, HS1, which is located at approximately the transcription start site of exon 1b. HS1 is clearly detected in HD24 and appears weaker in DT40. B, the same purified DNAs as used above were digested with BglII and SspI and probed with a 500-bp BglII-XbaI fragment. The expected parental fragment is 1400 bp. The detection of shorter bands (indicated by arrows) with increasing amount of DNase I indicates the presence of a hypersensitive site, HS2, which is located 3 kb upstream of exon 1a. HS2 is clearly detected in both HD24 and DT40 cells and appears as a doublet. The two cutting sites were estimated to be separated by around 150 bp. scribed in these cells contained exon 1b. The 160-bp exon 1b was also detected in RNA from brain and definitive red blood cells as well as fibroblasts and liver of 10-day embryo (data not shown), indicating that, unlike exon 1a, its transcription is not restricted to some cells or tissues. Exon 1b is observed in RNA from 10-day total embryo, with an intensity similar to that detected with exon 6 probe (Fig. 2), indicating that mRNA containing this exon is the most abundant form of the cGATA-2 mRNA population in a 10-day embryo.
As noted above, the comparison of the length of the previously published cDNA with the approximate size of the mRNA detected by Northern blot (about 4.3 kb) indicated that about 1.5 kb of transcribed sequence remained to be identified. We have shown that HD24 cells, like most cells, transcribe the 160-bp exon 1b, not the 180-bp exon 1a. To reconcile our observations with the known message length, we carried out 3Ј-RACE experiments to determine the position of the 3Ј-polyadenylation signal of cGATA-2 in this cell (data not shown). We found that it is located about 1400 bp downstream of the end of the published cDNA clone. Taking into consideration the small difference between exon 1a and exon 1b, we account for about 1420 bp of the unexplained sequence, consistent within our limits of error with the observed discrepancy of 1.5 kb.
Small amounts of larger fragments corresponding to the fully protected probe were also observed, showing that minor amounts of transcription can be initiated upstream of the major initiation site corresponding to the 160-bp exon 1b. However, these mRNAs were low in abundance in progenitor cells and could not contribute appreciably to the population that contained the coding region represented by exon 6. For these reasons, we focused our attention on identifying the promoter sequence that directs the transcription of the most abundant cGATA-2 mRNA in the hematopoietic progenitor cells, the exon 1b-containing mRNA.
DNase I Hypersensitive Site Mapping-The promoters and enhancers of active genes are typically associated with nuclease-HS sites (36,37). To help locate such elements near exon 1b, we examined the DNase I sensitivity of chromatin of the HD24 cell line, which expresses abundant mRNA containing exon 1b but not exon 1a. We used as a control the DT40 cell line, which does not express any cGATA-2 mRNA. We obtained similar results with a variety of restriction enzyme and probes. Two representative sets of HS mapping data obtained using the same preparation of DNase I-treated nuclei are presented in Fig. 4. A prominent HS site, HS1 (Fig. 4A), was found in HD24 cells, located at approximately the transcription start site of exon 1b. Because of its position, it seemed likely that HS1 might mark the promoter required for the transcription of this exon, indicative of its specificity for chromatin structure. It was also detected in 10-day red blood cells and in DT40 cells, although reproducibly weaker, but not in naked DNA (data not shown). Another HS site, HS2 (Fig. 4B), which actually appears as a doublet, was observed in HD24 but also clearly in the DT40 cell line, in 10-day red blood cells and brain but not in naked DNA (data not shown). It is located 3 kb upstream of the 5Ј-limit of exon 1a. Interestingly, the sequence downstream of HS2 and extending until at least exon 2 is very CG rich and the ratio CpG/GpC is close to 1, as expected in the case of a CpG island (38), whereas upstream of HS2, the CpG content drops to the expected ratio for a non-CpG island sequence. Using 6C2 cells in which some exon 1a is transcribed, we were not able to detect any HS site closer than HS2 to the 5Ј-limit of exon 1a.
DNA Sequence of the cGATA-2 Promoter Region Upstream of Exon 1b-The sequence of the putative promoter located upstream of exon 1b (GenBank accession no. AF038592), which is clearly hypersensitive in cells expressing this exon (Fig. 4A) is presented in Fig. 5A. As in the other GATA genes described (13, 24, 28 -32), no canonical TATA box or consensus to the transcription initiator element (39) is present. No homology to the binding site of the putative housekeeping initiator protein 1 (HIP1) described in the case of the GATA-3 genes (40) is apparent, nor is there the downstream promoter element motif observed in some TATA-less promoters (41). The putative promoter region lacks any strict consensus GATA site, and none has been detected in the 1800 bp upstream of exon 1b. However, a CCAAT box in inverted position is observed around the position Ϫ100. Such an element has been shown by Brewer et al. (24) to be absolutely required for the activation of the zygotic gene in the Xenopus embryo, at the beginning of gastrulation. They showed that a CCAAT maternal protein binds to this element. We note that 15 bp that include this CCAAT box are perfectly conserved between the chicken and the Xenopus and appear also very conserved in the human (28) and mouse (27) gene proximal promoters. (Fig. 5B). No function has been assigned to this extended sequence in the case of the Xenopus or any other GATA-2 gene.
Transcriptional Activity of cGATA-2 Promoter in Hematopoietic Progenitor Cells-In order to analyze the transcriptional activity of the sequence located upstream of the exon 1b, we constructed chimeric reporter plasmids containing this sequence fused to the luciferase (LUC) reporter gene (Fig. 6) and transfected them into 6C2 and HD24 cells, both of which transcribe exon 1b (Fig. 3). We carried out successive 5Ј deletions starting with a construct containing 4900 bp of 5Ј-flanking sequences. It should be noted that, compared with the promoterless construct, the activity of any given construct in 6C2 was always higher than in HD24, consistent with the observed difference in expression levels in these cells in vivo. In both kinds of cells, representing different developmental stages (HD24 is a multipotent erythroid-myeloid line, whereas 6C2 cells are arrested at the later CFU-E stage), deletion of the 120-bp region containing the CCAAT box (deletion from positions Ϫ160 to Ϫ40) resulted in a dramatic drop of the activity, indicating that these 120 nucleotides contain regulatory ele- FIG. 5. A, sequence of exon 1b and upstream. The ϩ1 marks the estimated 5Ј limit of exon 1b, which is estimated based on the size of the protected band detected by RNase protection (Fig. 3). The sequence (GenBank accession number AF038592) is numbered with respect to this position. A CCAAT box in inverted position is located in the putative promoter. The underlined sequence, which includes the CCAAT box, is 100% homologous (B) to that contained in the Xenopus GATA-2 promoter (24) and is also highly conserved in the human (28) and the mouse (27) GATA-2 gene. The numbers at the top of the chicken sequence refer to the position of the nucleotides relative to the estimated ϩ1 of exon 1b. ments important for the transcription of exon 1b in both cell types. In this respect, the cGATA-2 proximal promoter is quite similar to that of mouse, with strong stimulatory activity localized to a region between 100 and 200 bp upstream of the transcription start site.
The profiles of activity as a function of the size of the 5Ј deletion were somewhat different in HD24 compared with 6C2 cells. Deletion between Ϫ4900 and Ϫ1900 resulted in an increase in activity in each type of cell, suggesting that repressive elements might be present in this region. However, this deletion had a much greater effect in HD24 than in 6C2, and in fact, all of the upstream region 5Ј of Ϫ160 had a strong inhibitory effect in HD24. We note that a fragment containing the 5Ј sequence Ϫ1900 to Ϫ580, when linked to the LUC gene, displayed considerable promoter activity in HD24, suggesting that specific DNA binding factors bound to this region may play a role in regulation. That role is inhibitory in HD24 when the region is in its normal upstream location. In any case, these upstream inhibitory regions have a much smaller effect in 6C2 cells, in which the largest amounts of exon 1b-containing transcript are observed. In these cells, deletion of the region between Ϫ1900 and Ϫ160 has little or no effect, and all of the activity appears to arise from the sequence Ϫ160 to Ϫ40, which contains the CCAAT domain. Results similar to those obtained in 6C2 cells were observed in HD37 cells, another stable CFU-E stage precursor cell line (data not shown). Our data suggest that a large part of the developmental regulation of expression may arise from the differential inhibition mediated by the upstream regions.
The addition of the hypersensitive site HS2 in some of the constructs described above leads to a dramatic reduction of the LUC activity (data not shown), suggesting that this hypersensitive site does not act as an enhancer in these transient transfection assays and is not directly involved in activation of transcription from this exon in these cells.

The CCAAT Element of the Minimal Promoter Includes a Previously Undetected Extended Motif Common to GATA-2
Genes-Previous studies of the GATA-2 gene in the Xenopus embryo (24) have shown the importance of the CCAAT box located in the minimal promoter for transcriptional activation of the zygotic gene at gastrulation. This element is perfectly conserved in the chicken gene, where it is also contained in the minimal promoter, and in the human gene (28) (Fig. 5B). However in this last case, its importance has not yet been studied. Because the region containing the CCAAT box includes all of the proximal elements necessary for high level expression in 6C2 and HD24 cells, we further explored its role in the transcription of the cGATA-2 gene. We noticed that not only the CCAAT element itself was conserved; in the region 3Ј of the element, several additional nucleotides are perfectly conserved among the chicken, Xenopus, human, and mouse genes (Fig.  5B).
We first explored the central CCAAT motif. A doublestranded oligonucleotide spanning the sequence Ϫ115 to Ϫ89, which contains the CCAAT box, and the entire conserved region was used in electrophoretic mobility shift assays with protein extracts prepared from HD24, 6C2, DT40, and QT6 cells (Fig. 7A). QT6 is a quail fibroblast line (42) in which we were not able to detect GATA-2 message by Northern blot analysis. Each extract gave rise to a single complex, and the bands showed identical mobilities. This complex is competed by a 100-fold molar excess of the unlabeled probe but not by the same molar excess of a double-stranded oligonucleotide in which the central base of the CCAAT motif has been mutated to CCGAT (mCCAAT), a change known to be critical for CCAAT factor binding (43). The complex observed is also not competed by oligonucleotides making up the binding sites for Sp1 or AP2. An oligonucleotide spanning the sequence Ϫ108 to Ϫ84 (oligonucleotide B), which contains only 4 bp upstream of the CCAAT box, also does not compete the complex, even though it contains the downstream conserved sequence. When labeled, this oligonucleotide does not form any specific complex (data not shown), suggesting that the adjacent conserved sequence is not itself recognized by any specific proteins.
Although the central CCAAT motif is essential, it is also not sufficient. Mutation of the nine nucleotides between Ϫ97 and Ϫ89 in the context of the probe used to detect the CCAAT binding factor (oligonucleotide m9) reduced the affinity of binding of the specific complex observed. This is clearly shown by the range of competition presented in Fig. 7C: the mutated oligonucleotide m9 does not compete as well as the nonlabeled wild type sequence. This suggests that the affinity of the CCAAT factor that we detected for its target sequence is affected by the surrounding sequence, a conclusion also sug-gested by results obtained in direct binding assays comparing the wild type and m9 probes (Fig. 7D). Thus, the extended sequence homology that we have identified among GATA-2 CCAAT motifs appears to be important for binding at that site. FIG. 7. The minimal cGATA-2 promoter binds a CCAAT factor. A, 13 g of nuclear protein extracts from 6C2, HD24, DT40, and QT6 cells were incubated with a 27-bp double-stranded radiolabeled oligonucleotide probe, corresponding to sequence Ϫ115 to Ϫ89 within the cGATA-2 minimal promoter (Fig. 5A). This sequence spans the CCAAT box and contains the nucleotides conserved between the Xenopus, human, and chicken genes. B, competitions were performed using an 100-fold molar excess of unlabeled oligonucleotide corresponding to the probe itself (Self), the same oligonucleotide as used as the probe except that the CCAAT motif was mutated to CCGAT (mCCAAT), the same oligonucleotide with the nine nucleotides between Ϫ97 to Ϫ89 mutated (GGCCCGGCC to TTAAATTAA, m9), oligonucleotides corresponding to binding site for transcription factors Sp1 or AP2, or an oligonucleotide spanning the sequence Ϫ108 to Ϫ84 (oligonucleotide B) that contains only 4 bp upstream of the CCAAT box and an extension of the cGATA-2 sequence in 3Ј. C, the labeled wild type oligonucleotide (sequence Ϫ115 to Ϫ89) was incubated with 6C2 nuclear extract in presence of an increasing molar excess (5-100-fold) of unlabeled oligonucleotides corresponding to the probe itself (Self) or to the m9 oligonucleotide. Lanes 1 and 8 of the gel do not contain any nuclear extract. D, comparison of the complex detected with the labeled wild type or m9 oligonucleotide in presence of 6C2 protein extract.
We have tested the function of the CCAAT element by introducing mutations at the central residue of the CCAAT motif (CCGAT, the mutant mCCAAT LUC). We also measured the effect of the mutation of the nine conserved nucleotides we had tested in vitro (m9 LUC) in the same context, as well as the simultaneous mutation of the CCAAT box and the conserved nucleotides (mCCAAT ϩ m9 LUC). Transient transfection of the mCCAAT LUC construct into both HD24 and 6C2 cells revealed a 3-fold decrease in activity relative to the wild type promoter (data not shown). Mutations in the 3Ј-flanking region of the site had a similar effect, consistent with our gel shift data, and showing the importance of the extended motif. DISCUSSION We have found that the chicken GATA-2 gene, like other members of the GATA family, has a complex series of regulatory elements controlling alternate first exons and can thus give rise to distinct transcripts in different cell types. The cGATA-2 gene has two distinct transcription start sites separated by about 6.2 kb, and we observed two transcripts carrying different untranslated first exons. Of those GATA-2 genes that have been cloned, the only other so far discovered to have this arrangement is that of the mouse, quite recently reported. However, although the mouse also uses two different first exons, they are employed in quite different ways in the two organisms. In the chicken, the predominant transcript in all cells and tissues, and most notably in erythroid cells, derives from the proximal promoter, whereas in the mouse, hematopoietic-specific expression is delegated to the distal promoter. The two regulatory systems are thus remarkably distinct.
The alternative exon structure of cGATA-2 might easily have escaped notice, because the cDNA earlier isolated from a total 10-day embryonic cDNA library (11) corresponded only to transcripts from the distal promoter, thus leading to the assumption that there was a single transcript in all cells. We were able to determine by examination of our genomic clone that this transcript initiated at a site about 6.7 kb upstream from the first coding exon. However, when we examined the RNA population in a variety of cells and tissues it soon became apparent that this was not the principal transcript in any of them, and was not detected at all in HD24 and 10-day definitive red blood cells, indicating that its expression is subject to a developmentspecific, cell-specific, or inducible regulation. Instead we found that the cGATA-2 gene can be transcribed from an alternative first exon (exon 1b), which is located 403 bp upstream of exon 2 in the genomic DNA. The resulting mRNA is the most abundant form detected in all of the cells and tissues we tested that express GATA-2. It is possible that the under-representation of this mRNA in the cDNA sample analyzed by Yamamoto et al. (11) arises from difficulty in reverse transcription of GC rich sequences containing exon 1b, which we have observed.
The existence of alternative first exons has not been reported for the Xenopus (24) or the human (17, 28) GATA-2 gene. It has been described in the mouse, and as noted above, in that organism the upstream promoter appears to be utilized primarily in hematopoietic cells, whereas the promoter proximal to the gene is used in all tissues. The existence of alternative first exons has also been observed for other members of the GATA family (30,44,45). For example, the mouse GATA-1 gene is transcribed from alternative first exons, at least in mouse erythroleukemia and MC/9 mast cells (30). In the mouse testis, the gene is expressed from another first exon, which is different from the cell-specific ones (44). The chicken GATA-5 gene has also been reported to be transcribed from two alternative first exons (45).
The more proximal first exon of the cGATA-2 gene, exon 1b, appears analogous to the first exon described for the Xenopus and the human GATA-2 genes, suggesting that an alternative first exon located further upstream may exist for these two genes as well. Indeed, the reported first introns of the Xenopus (250 bp) (24) or human (800 bp) (17) gene are closer in size to the 403-bp chicken intron located between exon 1b and exon 2 than they are to the chicken intron between exon 1a and exon 2 (6.5 kb). Furthermore, an inverted CCAAT box positioned upstream of a highly conserved sequence is located in the minimal promoter of the Xenopus gene and just upstream of the chicken exon 1b. This motif is also present in the human gene but its function has not been yet analyzed. In Xenopus, the CCAAT box is required for the activation of the zygotic gene at the onset of gastrulation during embryogenesis. In this study, we have shown that this element is also necessary for the full activity of the chicken minimal promoter driving the transcription of exon 1b-containing mRNA in hematopoietic progenitor cells. This minimal promoter is contained between the sequences Ϫ160 and Ϫ40 and maps to a strong hypersensitive site in vivo. Thus, promoter organization appears to be conserved across species, although the particular developmental tasks assigned to each promoter vary.
Our identification of the activation properties of the CCAAT box show that the conservation between species is not restricted to the overall organization of the GATA-2 gene but extends also to some aspects of the regulatory mechanism. The CCAAT box appears to function regardless of the point in development or the cell type in which GATA-2 is expressed. It is possible that the particular CCAAT factor employed may vary. A multiplicity of CCAAT box binding activities have been identified, some of which are tissue-restricted, whereas others are expressed ubiquitously (47). However, our electrophoretic mobility shift assay experiments do not reveal the presence in different cell types of other CCAAT factors binding to the CCAAT motif of the cGATA-2 gene. We have shown that the affinity of binding of this protein to its target depends on the highly conserved nucleotides located 3Ј of the CCAAT box, suggesting that there may be additional factors that bind to the extended site. Our data show that if such a factor exists, it binds only when the CCAAT factor already occupies its own adjacent site.
The CCAAT factor, which binds in vitro to the CCAAT motif located in the cGATA-2 gene, is also detected in DT40, a chicken lymphoblastoid cell line in which the transcription of the gene is, as expected, not detected. This leads us to conclude that the CCAAT site is not sufficient for the restricted expression of the gene. This is consistent with results seen with mouse GATA-2, in which the region containing the CCAAT motif induces severalfold stimulation of reporter expression, even in cells that do not express GATA-2 (27). Previous work performed on the Xenopus embryo (18) showed that during gastrulation, the initial expression of the zygotic gene occurs in a broad domain, throughout the ventral and lateral regions of the embryo. In embryo explants, accumulation of xGATA-2 mRNA can be inhibited by co-culturing with either activin, dorsal marginal zones, or the dorsalizing and neural inducer noggin, suggesting that the localization of xGATA-2 expression to the ventral region is a consequence of negative control during dorsalization and neural induction (18).
These results suggest that in all GATA-2 genes, transcription of the exon 1b-containing mRNA may be controlled by ubiquitous transcription factors, including the CCAAT factor, and that the absence of transcription of the gene in some of the cells of the hematopoietic system, such as the lymphocytes, or in other tissues, is the consequence of negative control. We suggest that tissue-specific suppression of GATA-2 expression might therefore arise from inhibitory signals elsewhere in the neighborhood of the gene, but some distance away. Our transient transfection experiments are consistent with that point of view. They indicate that sequences located upstream of the minimal promoter have a strong inhibitory effect in HD24, a chicken multipotent erythroid-myeloid cell line, but not in 6C2 or HD37, two distinct CFU-E stage precursor cells. This is entirely consistent with the relative endogenous expression of GATA-2 in these cells, suggesting that the upstream elements confer developmentally specific regulation.
Such a model also may explain the plasticity of the GATA-2 regulatory pattern, which allows similar promoters to be used in quite different ways in different organisms. The isolated proximal promoter, common to many GATA-2 genes, is potentially active in a variety of cells, but its expression is restricted by more distant elements. By varying the choice of those elements and their activity during development, it is possible to alter the role of the proximal promoter from one organism to another.