Transcripts for functionally distinct isoforms of chicken GATA-5 are differentially expressed from alternative first exons.

Our analysis of cDNA and genomic clones unexpectedly revealed that the chicken gata-5 gene is differentially expressed from alternative first exons. Moreover, we show that the respective transcripts are differentially processed to yield mRNAs for two distinct isoforms of GATA-5. The major isoform, which we described previously, has two CXNCX17CNXC zinc fingers typical of a vertebrate GATA factor. The minor isoform, on the other hand, has only one such zinc finger. We show that this novel isoform localizes within the nuclei of transfected cells and can bind to a consensus GATA site. This truncated isoform of GATA-5 is compromised in its ability to transactivate a simple target gene, however, and thus is functionally distinct from the major isoform of GATA-5.

The identification of cis-acting elements in erythroid-specific promoters and enhancers that conform to the consensus WGA-TAR sequence led to the cloning of GATA-1, which proved to be the founding member of a family of transcription factors (1,2). Five additional members of this GATA factor family have thus far been identified from various vertebrate species using low stringency hybridization screens and other methods (3)(4)(5)(6). These vertebrate GATA factors display the greatest degree of conservation over their DNA-binding domains, which are composed of two related zinc fingers of the general form CXNCX 17 CNXC. Factors with zinc fingers that conform to this consensus motif have also been found in lower eukaryotes. However, in contrast to the previously reported vertebrate GATA factors, some of the invertebrate GATA factors have only one zinc finger (7)(8)(9)(10). These single fingers are most similar to the second zinc fingers of vertebrate GATA factors, and accordingly, the latter have been shown to be necessary and sufficient for most of the DNA binding specificity of vertebrate GATA factors (11,12).
Vertebrate GATA factors can be grouped into two subfamilies on the basis of sequence and expression pattern similarities. For example, GATA-1/2/3 are all expressed (albeit not exclusively) in hematopoietic cells, whereas GATA-4/5/6 are all expressed (albeit not exclusively) in the heart. Insights into the biological relevance of GATA factor expression have recently been obtained using a variety of approaches, including gene disruption assays in transgenic mice. In particular, mice that fail to express GATA-1 or GATA-2 exhibit lethal hematopoietic defects (13,14), whereas mice that fail to express GATA-3 display severe nervous system and liver hematopoietic defects (15). Similarly, it has been determined that GATA-4 expression is required for embryonal stem cells to differentiate either into cardiac myocytes or into primitive endoderm in cell culture (16,17).
As noted above, the gata-4/5/6 genes have overlapping, but not identical, expression patterns. Whereas all three of these genes are expressed in myocardial and endocardial cells, the gata-5/6 genes are also robustly expressed in gut epithelial cells, and the gata-6 gene is additionally expressed in the liver, lung, and ovary (6). As a initial step toward determining the molecular basis for the tissue-specific expression of GATA-5 in the heart and gut, we cloned and characterized the chicken gata-5 gene. Somewhat surprisingly, the structure of this gene was found to differ markedly from the otherwise conserved gata-1/2/3 genes, which serves to highlight yet another distinction between these two subgroups of vertebrate GATA factors.
Even more surprising, we found that the chicken gata-5 gene is differentially expressed from two alternative first exons. Moreover, the respective transcripts yield mRNAs for two distinct isoforms of GATA-5. One of these isoforms is novel for a vertebrate GATA factor in that it has a DNA-binding domain composed of a single zinc finger. The functional properties of this novel GATA-5 isoform were assayed in some detail and are compared with the properties of the major isoform of GATA-5 that we previously described (6).

EXPERIMENTAL PROCEDURES
Isolation and Characterization of gata-5 Genomic Clones-Two gata-5 genomic clones were isolated by screening a chicken genomic library with a GATA-5 cDNA probe (6) using standard protocols (18). Fragments harboring gata-5 coding exons were identified by Southern blot analysis, cloned into plasmids, and sequenced with primers directed against GATA-5 cDNA sequences.
RNase Mapping Experiments-A tissue lysate RNase protection kit (Amersham Corp.) was used to map gata-5 exons 1b and 2. The riboprobes that were used for this analysis spanned either a genomic SmaI fragment (exon 1b; see Fig. 4) or a genomic BamHI/NcoI fragment (exon 2; see Fig. 3). These riboprobes were prepared using commercial reagents (Promega). PCR 1 Protocols-Poly(A) ϩ mRNA was isolated from various tissues as described previously (6). Anchor and nested antisense primers for 5Ј-RACE assays were directed against unique sequences located within the 5Ј-untranslated region of GATA-5 cDNA: anchor primer, 5Ј-GTC-CTGGGCACGTAGACG-3Ј; and nested primer, 5Ј-GATACATGTTC-CGTCCTCG-3Ј. These primers were used in conjunction with generic sense primers from a 5Ј-RACE kit (CLONTECH). The PCR products were cloned into the pCRII plasmid (Invitrogen), and the inserts were sequenced using SP6 and T7 primers and Sequenase reagents (U. S. Biochemical Corp.).
Poly(A) ϩ mRNAs from various tissues (see above) were used to prepare cDNAs as described previously (19). These cDNAs were then used as templates in RT-PCR reactions. Sense strand primers specific for exons 1a and 1b and an antisense primer for the 3Ј-untranslated region of the GATA-5 cDNA are as follows: exon 1a sense primer, 5Ј-AATT-GCCACCCTCCCGACG-3Ј; exon 1b sense primer, 5Ј-CATGGTCT-GAGCGCAGC-3Ј; and antisense primer, 5Ј-GGGATGCGTTTATTT-GCT-3Ј. The 40 PCR cycles (1 min at 95°C, 1 min at 58°C, and 1.5 min at 72°C) were followed by a 4-min incubation at 72°C. The PCR buffer (Perkin-Elmer) was supplemented with 4% dimethyl sulfoxide. The resultant PCR products were cloned into the pCRII plasmid and sequenced using SP6 and T7 primers and Sequenase reagents.
Construction of Expression and Reporter Vectors-The RT-PCR products described in the preceding paragraph were shuttled into a cytomegalovirus-driven eukaryotic expression vector (pcDNA3, Invitrogen) for cotransfection assays. Recombinant plasmids with inserts in the correct orientation were sequenced to verify that the inserts were devoid of mutations.
The reporter plasmid for cotransfection assays (see below) was made by inserting a consensus WGATAR binding site (12) into the unique BamHI site that resides immediately upstream of the minimal liver/ bone/kidney alkaline phosphatase promoter in the pTATA/CAT reporter plasmid (20). The consensus binding site that was cloned into this reporter plasmid was also used as a probe for the gel shift assays described below.
Transfection Assays-COS-7 cells (American Type Culture Collection) were grown to 70 -90% confluency on 60-mm dishes in Dulbecco's modified Eagle's medium (supplemented with 10% fetal bovine serum) and cotransfected with 12 g of Lipofectamine (Life Technologies, Inc.) and varying amounts of expression and reporter plasmids (see Fig. 9). Standard protocols were use to make extracts and to assay protein concentrations and chloramphenicol acetyltransferase activities (21). The amounts of radiolabeled substrate and product were quantified using an AMBIS radioanalytic imaging system.
Gel Shift Assays-Nuclear extracts were prepared from transfected COS-7 cells (see above) essentially as described previously (22) except that leupeptin (1 g/ml) and aprotinin (1 g/ml) were added to all buffers. These extracts were used to program gel shift assays in combination with a consensus GATA binding site probe (12) that was prepared by annealing the following pair of oligonucleotides: 5Ј-GATCTGCGGATAAGATAAGGCCGGAATTCG-3Ј and 5Ј-GATCCG-AATTCCGGCCTTATCTTATCCGCA-3Ј.
The mutated site that was used as a competitor fragment to demonstrate binding site specificity (see Fig. 8) differs from the consensus site at the underlined bases (GAT was changed to CCC, and ATC was changed to GGG). The gel shift binding buffer contained 0.1% bovine serum albumin, 25 mM KCl, 10 mM Tris (pH 8.0), 1 mM EDTA, 1 mM dithiothreitol, and 4 g of poly(dI⅐dC)/30-l reaction. Binding reactions were carried out at 4°C for 30 min. The samples were then supplemented with loading buffer and resolved on a 6% acrylamide gel cast in 0.25 ϫ Tris borate/EDTA (19). The buffer was circulated manually at 30-min intervals. Following electrophoresis, the gels were dried under vacuum and then exposed to X-Omat film (Eastman Kodak Co.) at Ϫ70°C between intensifying screens.

Isolation and Characterization of gata-5 Genomic
Clones-We previously sequenced a cDNA clone that spans the chicken GATA-5 open reading frame and includes 64 and 168 bp of 5Ј-and 3Ј-untranslated sequences, respectively (6). This cDNA insert was used as a probe to isolate two overlapping gata-5 genomic phage clones. Fragments containing gata-5 coding exons were mapped using Southern blots, cloned into plasmids, and sequenced with gata-5-specific primers. The GATA-5 open reading frame was thus revealed to span six exons (Figs. 1 and 2). Consensus splice donor and acceptor sites were found to flank each of the coding exons (i.e. exons 2-7; the noncoding first exons are discussed below) as expected (Fig. 2).
We next used an RNase protection assay to map the upstream end of the first coding exon. As shown in Fig. 3, an 89-nucleotide fragment was protected when this assay was programmed with RNA from adult heart or adult gut (which express GATA-5). No protected fragments were obtained when brain or skeletal muscle (which do not express GATA-5) was instead used as the source of RNA. The fact that this exon breakpoint mapped precisely to a consensus splice acceptor site (denoted site 3 in Fig. 2) and the fact that several other gata genes have been shown to have noncoding first exons (14,23,24) suggested that the chicken gata-5 gene might similarly contain a noncoding first exon(s).
Evidence for Two Distinct Noncoding GATA-5 First Exons-Since exhaustive screens of several cDNA libraries failed to provide evidence for such a noncoding first exon, we resorted to a directed RACE/PCR analysis of embryonic heart and adult gut mRNAs (two tissues in which GATA-5 is expressed robustly). Unexpectedly, two distinct cDNA sequences were found to lie immediately upstream of the presumptive second exon sequence. We resolved that the genomic copies of these two novel sequences were located 3.5 and 1.5 kilobases, respectively, upstream of the common second exon and that these sequences were flanked by consensus splice donor sites (see sites 1 and 2, respectively, in Fig. 2). These results indicate that the gata-5 gene has two alternative (presumably first) exons, which we will refer to henceforth as exons 1a and 1b.
The RNase protection assay shown in Fig. 4 further revealed that exon 1b is 270 -285 bp in length. We infer that this is a first exon for two reasons. First, the predominant 5Ј-ends map to sequences that are typical of polymerase II transcriptional start sites (25,26), namely, purines embedded within pyrimidine-rich tracts (Fig. 5). Second, consensus splice acceptor sites do not map in the vicinity of these 5Ј-ends (Fig. 5). Although primer extension assays are often used to confirm the assignment of transcriptional start sites, we have been unable to synthesize cDNA copies of this sequence even when we use in vitro transcribed templates. We presume that this technical limitation is attributable to the fact that this exon is extremely GC-rich.
Since RNase protection and primer extension assays designed to detect exon 1a-containing transcripts in embryonic heart yielded negative results (data not shown), we infer that these transcripts are relatively rare in this tissue. Consistent with this inference, we note that only 5 of the 44 RACE cDNA clones that were obtained using an antisense primer from the second exon (described above) contained exon 1a sequences; all of the others contained exon 1b sequences (data not shown). However, as discussed below, we have been able to deduce that exon 1a is at least 256 bp in length.
Differential Promoter Usage and Alternative Splicing-The fact that exon 1a RACE cDNA clones were obtained from embryonic heart (but not from adult heart or gut) suggested that this first exon might be expressed in a development-specific or tissue-specific manner. We explored this possibility by carrying out RT-PCR assays with sense oligonucleotides specific for either exon 1a or 1b in combination with a common antisense oliogonucleotide specific for the 3Ј-untranslated region of GATA-5 mRNA (Fig. 6). The cDNA templates that were used for this analysis were derived from embryonic (day 10) heart, adult heart, and adult skeletal muscle. As predicted, exon 1b-containing RT-PCR products of the expected size and sequence (1516 bp; data not shown) were obtained using heart (both embryonic and adult), but not skeletal muscle, cDNA templates. In contrast, whereas exon 1a-containing transcripts were detected in embryonic (but not adult) heart, the predominant RT-PCR product (indicated by the lower arrow in Fig. 6) was smaller than expected (943 bp instead of 1540 bp). An analysis of this 943-bp RT-PCR product revealed that exon 1a was precisely juxtaposed to exon 3 rather than to exon 2 (Fig.  7); all of the other exons were spliced normally (data not shown). We also sequenced the minor (1540-bp) exon 1a-specific RT-PCR product and verified that it included the exon 2 sequence as expected (data not shown). By carrying out similar RT-PCR assays with primers that map upstream of the exon 1a primer used for the analysis presented in Fig. 6, we determined that exon 1a is at least 256 bp and contains termination codons in all three reading frames (Fig. 7).
Characterization of a Single-zinc Finger Isoform of GATA-5-The predominant splicing pathway for the exon 1a-containing transcripts yields mRNAs that lack the previously reported translational initiation site for GATA-5. Based on Kozak rules (27), translation initiation is predicted to occur at the first methionine codon that is embedded in a favorable sequence context, which, in the case of this novel GATA-5 mRNA, lies within the exon 3 sequence (Fig. 7). Indeed, this ATG codon functions as an efficient translational initiation site in vitro as predicted (data not shown). Since this methionine residue is located within the coding region for the first zinc finger, the FIG. 3. RNase protection assay for the first coding exon of GATA-5. A riboprobe for the indicated genomic fragment was hybridized to tissue lysates, digested with RNase, resolved by electrophoresis, and imaged by autoradiography. Adult heart and stomach (but not brain or muscle) lysates yielded 89-nucleotide (nt) protected fragments as indicated. E1a, exon 1a; E1b, exon 1b; E2, exon 2.
resultant GATA-5 isoform contains only one (i.e. the second) zinc finger. This raised three obvious questions. First, can the predicted single-zinc finger isoform of GATA-5 localize to the nuclei of transfected cells? Second, can this truncated isoform bind specifically to a consensus GATA site? And third, can this novel isoform transactivate a simple GATA-dependent target gene?
To address these questions and to compare the properties of the full-length and truncated GATA-5 isoforms, we cloned RT-PCR products that span the respective open reading frames (Fig. 6) into a eukaryotic expression plasmid and transfected these plasmids into COS-7 cells, which do not express endogenous GATA factors. Nuclear extracts from these transfected cells were used to program the gel shift assay shown in Fig. 8. This analysis revealed that both isoforms can be stably expressed in vivo and that both isoforms can bind a consensus GATA site in vitro (lanes 1 and 6, respectively). These protein-DNA interactions are sequence-specific since an excess of the  1-4) or the full-length isoform (lanes 6 -8) of GATA-5. Competition assays were carried out using a 50-fold excess of a specific competitor (lane 3) or a 100-fold excess of either a specific (lanes 4 and 8) or a nonspecific (lanes 2 and 7) competitor as indicated. Complexes that contain the truncated and full-length isoforms are marked with arrows to left and right of the gel, respectively.  (marked primers a-c). The predominant splicing patterns for transcripts from the two alternative first exons of the gata-5 gene are indicated by dashed lines. The RT-PCR products that indicated these splicing patterns are shown in the lower portion. These RT-PCRs were programmed with total cDNAs derived from embryonic heart (lanes 1 and 2), adult heart (lanes 3 and  4), and adult skeletal muscle (lanes 5 and 6). Molecular weight markers were run in lane 7. E1a, exon 1a; E1b, exon 1b; E2-E7, exons 2-7; kb; kilobase.
FIG. 7. Composite representation of the exon 1a promoter region and the 5-untranslated region of the mRNA for the truncated GATA-5 isoform. Sequences numbered 1-542 correspond to genomic sequences from the exon 1a genomic region. We infer that the transcriptional start site(s) maps upstream of position 287 since a primer from region 287-303 (as well as a primer from region 452-470; the latter corresponds to primer a in Fig. 6) yielded the expected RT-PCR products (data not shown). The resultant mRNA juxtaposes exons 1a and 3, as indicated by the vertical line. The translational initiation codon for this GATA-5 isoform maps to position 596 in this arbitrary numbering system and is preceded by a short open reading frame as indicated.
unlabeled consensus site competed for binding (lanes 3, 4, and 8), whereas a similar excess of a mutated site did not (lanes 2  and 7). The distinct mobilities of these complexes are consistent with the fact that the full-length isoform is 391 amino acids long, whereas the truncated isoform is only 190 amino acids long.
We next addressed whether this truncated GATA-5 isoform can function as a transcriptional activator. Expression vectors for the two isoforms of GATA-5 (see above) were cotransfected into COS-7 cells along with a reporter plasmid that has a consensus GATA site in the promoter region (12). The results of this analysis are presented in Fig. 9. As expected, the fulllength isoform of GATA-5 was able to transactivate this GATAdependent reporter plasmid. Note that the -fold induction decreased when an excess of this full-length isoform was expressed, presumably due to squelching. The truncated isoform was also able to transactivate this reporter construct, albeit much less efficiently than the full-length isoform. DISCUSSION The six GATA factors that have been identified from vertebrate species can be grouped into two distinct subfamilies (i.e. GATA-1/2/3 and GATA-4/5/6) on the basis of cDNA sequence comparisons as well as expression profile comparisons. Thus, it is perhaps not surprising to find that a member of the GATA-4/5/6 subfamily has a gene structure that is distinct from the conserved gata-1/2/3 gene structure. On the other hand, the extent to which these gene structures differ is rather remarkable. Indeed, only two features are conserved across these two subfamilies, i.e. noncoding first exons and comparable second zinc finger exons (Fig. 10). Assuming that the two GATA subfamilies were founded by the duplication of an ancestral gene, the fact that the gata-4/5/6 gene structures are similar to each other 2 but distinct from the gata-1/2/3 gene structures implies that a total of three introns must have been lost or gained from the gata-1/2/3 ancestral gene and/or from the gata-4/5/6 ancestral gene prior to the expansions of the respective subfamilies. Thus, the ancestral gata-1/2/3 and/or gata-4/5/6 genes presumably existed for a long period of evolution before each spawned multiple progeny.
Two of the introns that are unique to the gata-4/5/6 gene subfamily (introns 5 and 6; see Figs. 2 and 10) appear to coincide with the boundaries of functional domains. For example, based on structural studies carried out with GATA-1 (28), we infer that intron 5 maps precisely to the carboxyl-terminal end of the minimal DNA-binding domain of the second zinc finger of GATA-5. It is also interesting that introns 5 and 6 delimit a domain that is rich in PEST residues (66% of the residues in exon 6 are Pro, Glu, Ser, Thr, or Asp), which suggests that this domain may be a determinant of GATA-5 instability (29). In support of this conjecture, we note that the PEST-rich amino acid composition (but not primary sequence) for this exon is conserved within the GATA-4/5/6 subfamily.
As noted above, noncoding first exons are a common feature of vertebrate gata genes. Moreover, alternative first exons have been identified for both the mouse gata-1 gene (23) and the chicken gata-5 gene (this report). The gata-1 gene first exons are differentially transcribed in erythroid cells and in the testis. Since these alternative noncoding exons are each spliced to a common second exon, the same GATA-1 protein is encoded in both cell types. In the case of the gata-5 gene, however, transcripts that include the distal first exon are preferentially spliced to the third exon, which results in an mRNA that encodes a novel single-zinc finger isoform of GATA-5. So far as we are aware, this is the first evidence of a single-zinc finger GATA factor being encoded in a vertebrate species. On the other hand, a novel (albeit two-zinc finger) GATA-1 isoform has been reported to result from the use of an internal translational 2 C. Z. He and C. MacNeill, unpublished result. FIG. 10. The gata-5 gene structure is markedly distinct from the gata-1/2/3 gene structures. The chicken gata-5 gene structure deduced in this study is compared with the chicken gata-1 (36), frog gata-2 (37), and mouse gata-3 (24) gene structures. The two zinc finger exons (ZF1 and ZF2) are indicated by closed boxes, and exons that map upstream and downstream of these zinc finger exons are indicated by boxes with leftward and rightward stripes, respectively. Open boxes denote noncoding regions. The numbers of amino acids encoded by each exon are indicated; note that ZF2 is the only coding exon that has a conserved structure for gata-1/2/3 and gata-5. Although not shown in this figure, the chicken gata-4 and gata-6 gene structures are very similar to the chicken gata-5 gene structure (see Footnote 2). gata-5 gene introns (i1a, i1b, and i2-i6) are numbered at the bottom. initiation site (30).
Since mutational studies have revealed that the second zinc fingers of other vertebrate GATA factors are necessary and sufficient for binding to consensus GATA sites (12,31), it is not surprising that the truncated isoform of GATA-5 can also bind to these sites. However, since the DNA binding specificities of normal and mutant (single-zinc finger) GATA factors are not identical, we presume that the two naturally occurring GATA-5 isoforms will similarly be found to have somewhat distinct binding specificities. We are in the process of using a site selection protocol to test this prediction.
Based on the results of cotransfection assays (Fig. 9) and an in situ assay of epitope-tagged GATA-5 isoforms (data not shown), we infer that the truncated isoform has a nuclear localization signal. Whereas GATA-1 and GATA-3 appear to have multiple nuclear localization signals (12,31,32), the short stretches of basic amino acids that resemble consensus nuclear localization signals are not conserved between these GATA factors and GATA-4/5/6. For example, the RPKKR and KGKKK motifs that flank the second zinc finger of GATA-3 are replaced by KPQKR and KGKTS, respectively, in GATA-5. Conversely, a presumptive nuclear localization signal for GATA-5 (RKRKPK; located in the carboxyl-terminal region of the second zinc finger) is not conserved for GATA-1/2/3. Furthermore, based on structural studies (28), we infer that this RKRKPK motif probably also functions as an essential determinant for binding to consensus WGATAR sites.
We have shown that the single-zinc finger isoform of GATA-5 is compromised with respect to its ability to transactivate a simple reporter construct. Whereas single-zinc finger mutants of GATA-1 are also compromised with respect to their ability to transactivate simple target genes in cotransfection assays, these mutant factors can still cause early myeloid cells to differentiate into megakaryocytes in cell culture (32). Similarly, a single-zinc finger GATA factor from Aspergillus nidulans can rescue erythroid differentiation in GATA-1-deficient embryonic stem cells (33). Thus, we presume that the singlezinc finger GATA-5 isoform can regulate critical subsets of GATA target genes in the tissues in which it is expressed.
Finally, it may be noteworthy that the mRNAs for both isoforms of GATA-5 contain short open reading frames in their 5Ј-untranslated regions (Figs. 5 and 7). Based on the ribosome scanning model (27), these short open reading frames would be expected to impair the efficiency of translation initiation at the downstream (GATA-5) open reading frames. This may allow yet another level of regulation for differentially expressing these GATA-5 isoforms (34,35). On the other hand, since these upstream open reading frames were included in the respective GATA-5 expression vectors (Figs. 8 and 9), it is clear that these open reading frames do not preclude expression of these isoforms.