Transcriptional Regulation of the Human Erythroid 5-Aminolevulinate Synthase Gene

We have characterized the 5′-flanking region of the human erythroid-specific 5-amino levulinate synthase (ALAS) gene (the ALAS2 gene) and shown that the first 300 base pairs of promoter sequence gives maximal expression in erythroid cells. Transcription factor binding sites clustered within this promoter sequence include GATA motifs and CACCC boxes, critical regulatory sequences of many erythroid cell-expressed genes. GATA sites at −126/−121 (on the noncoding strand) and −102/−97 were each recognized by GATA-1 proteinin vitro using erythroid cell nuclear extracts. Promoter mutagenesis and transient expression assays in erythroid cells established that both GATA-1 binding sites were functional and exogenously expressed GATA-1 increased promoter activity through these sites in transactivation experiments. A noncanonical TATA sequence at the expected TATA box location (−30/−23) bound GATA-1- or TATA-binding protein (TBP) in vitro. Conversion of this sequence to a canonical TATA box reduced expression in erythroid cells, suggesting a specific role for GATA-1 at this site. However, expression was also markedly reduced when the −30/−23 sequence was converted to a consensus GATA-1 sequence (that did not bind TBP in vitro), suggesting that a functional interaction of both factors with this sequence is important. A sequence comprising two overlapping CACCC boxes at −59/−48 (on the noncoding strand) was demonstrated by mutagenesis to be functionally important. This CACCC sequence bound Sp1, erythroid Krüppel-like factor, and basic Krüppel-like factor in vitro, while in transactivation experiments erythroid Krüppel-like factor activated ALAS2 promoter expression through this sequence. A sequence at −49/−39 with a 9/11 match to the consensus for the erythroid specific factor NF-E2 was not functional. Promoter constructs with 5′-flanking sequence from 293 base pairs to 10.3 kilobase pairs expressed efficiently in COS-1 cells as well as in erythroid cells, indicating that an enhancer sequence located elsewhere or native chromatin structure may be required for the tissue-restricted expression of the gene in vivo.

5-Aminolevulinate synthase (EC 2.3.1.37) is a nuclear encoded mitochondrial matrix enzyme that catalyzes the formation of 5-aminolevulinate from glycine and succinyl CoA in the heme biosynthetic pathway and is of particular interest, since it is the rate-controlling enzyme (1)(2)(3). There are two closely related isozymes of 5-aminolevulinate synthase (ALAS) 1 designated ALAS1 and ALAS2, which are encoded by separate genes located on different chromosomes (4 -6). The housekeeping enzyme, ALAS1, is probably expressed in all tissues to provide heme for respiratory cytochromes and other hemoproteins (1,7). The second isozyme, ALAS2, is an erythroid cell-specific enzyme, the synthesis of which is developmentally regulated and is markedly increased during erythropoiesis to meet the demand for heme during hemoglobin production (1).
The genes for ALAS1 and ALAS2 have been isolated from various species (8 -12) and show a similar exon/intron organization (1). We have characterized the human ALAS2 gene (11) and shown that it consists of 11 exons spanning 22 kb (13) on the X chromosome (5). In the human disorder X-linked sideroblastic anemia, point mutations have been identified in ALAS2 that result in impaired enzyme activity and consequently reduced hemoglobin production (2,14).
Expression of the ALAS2 gene is regulated at both the transcriptional and post-transcriptional levels. Translation of the ALAS2 mRNA in erythroid cells is controlled by intracellular iron levels through an iron-responsive element located in the 5Ј-untranslated region to ensure that the production of protoporphyrin is coordinated with iron availability (1,11,15). Furthermore, heme may regulate activity of ALAS2 by preventing its import into mitochondria (1,16). During erythropoiesis, transcription of the ALAS2 gene is markedly up-regulated (1) together with an increase in the transcription of genes for the other heme pathway enzymes (17) and for globin (1,3,18). Only a small number of erythroid cell-restricted transcription factors have been identified that are involved in erythroid gene transcriptional activation (19), and these include GATA-1 (the prototype of a family of GATA proteins), NF-E2, and the CACCC box-binding protein, EKLF. In the present study, we have identified transcription factors that bind to the ALAS2 promoter to drive its expression and have examined, in detail, the role of GATA and CACCC box-binding proteins in this process. Gel shift assays have been employed to investigate the specificity of protein-DNA interactions in the ALAS2 promoter and the functional contribution of such binding sites evaluated by site-directed mutagenesis and transient expression analysis of ALAS2 promoter/reporter gene constructs.

EXPERIMENTAL PROCEDURES
Construction of Promoter/Reporter Gene Plasmids-A series of 5Јflanking ALAS2 deletion constructs were generated from subcloned * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The polymerase chain reaction was performed using pTC-EA1 as the template and the following primers: primer 1, 5Ј-CCCAAGCTTGCACT-GAGGACGAACG-3Ј at ϩ12/ϩ36 (an introduced HindIII site is underlined), and 5Ј-GGGTTCTGTAACTACATTGCC-3Ј, which bound upstream of an AvrII site at Ϫ718/Ϫ699 and resulted in the amplification of a 730-bp promoter fragment. The amplified product was digested with BglII and HindIII and a 321-bp fragment ligated into the similarly digested pGL2-Basic vector. The resulting construct is designated pA-LASϪ293-LUC and contains ALAS2 promoter sequence from Ϫ293 to ϩ28. The amplified product was also digested with SacI and HindIII and a 420-bp fragment ligated into the similarly digested pBluescript KS ϩ phagemid (pKS ϩ -ALAS). To synthesize plasmids with promoter lengths of Ϫ124 and Ϫ27, a PvuII site was introduced at these positions by site-directed mutagenesis, and the resulting modified plasmids were digested with SmaI (polylinker) and PvuII and religated to form pA-LASϪ124-LUC and pALASϪ27-LUC.
The synthesis of the longer promoter constructs was performed in several steps. In separate studies, a HindIII site was introduced at Ϫ7/Ϫ2 in the ALAS2 promoter by site-directed mutagenesis in a subclone containing Ϫ6.0 to ϩ5.0 kb of contiguous human ALAS2 sequence. Subsequent digestion of this subclone with XbaI or KpnI together with HindIII gave promoter lengths of 1.9 and 5.7 kb that were cloned into pGL2-Basic linearized with NheI/HindIII and KpnI/HindIII, respectively. These initial constructs terminated at position Ϫ4 and therefore did not contain the native transcription initiation site. To permit strict comparison with the shorter promoter constructs, the sequence from around the native transcription initiation site was then reintroduced into these constructs as follows. An AvrII-HindIII fragment (Ϫ700 to Ϫ4) was excised from the 1.9 kb promoter construct and replaced with an AvrII-HindIII fragment (Ϫ700 to ϩ28) that was amplified by the polymerase chain reaction, resulting in pALAS-1.9kb-LUC. An NcoI-HindIII fragment (Ϫ1.0 kb to Ϫ4) was removed from the 5.7 kb promoter construct and replaced with the NcoI-HindIII fragment (Ϫ1.0 kb to ϩ28) isolated from pALAS-1.9kb-LUC to generate pALAS-5.7kb-LUC. To synthesize the construct containing 10.3 kb of 5Ј-flanking region (pALAS-10.3kb-LUC), pTC-EA1 was digested with ClaI and XhoI, and a 5.7-kb fragment was cloned into the similarly digested vector pSP72 (Promega). An EcoRV-XhoI fragment isolated from this plasmid was used to replace a 1.1-kb SmaI-XhoI fragment in the construct pALAS-5.7kb-LUC.
Constructs with 124 bp of wild type promoter (pALASϪ124A-LUC) or a mutation in the Ϫ54 bp CACCC site were also synthesized for use in transactivation experiments. A 152-bp fragment (Ϫ124 to ϩ28) was generated by the polymerase chain reaction using the plasmids pA-LASϪ293-LUC and pALASϪ293mut8-LUC as templates, and two primers: 5Ј-GGTTTAGATCTTAGCAAGGAAGGGA-3Ј at Ϫ131/Ϫ106 (an introduced BglII site is underlined) and primer 1. Following digestion of the product with BglII and HindIII, the resultant fragment was cloned into the appropriately linearized pGL2-Basic vector. Additional constructs synthesized for use in transactivation experiments included p␤-glob-LUC and p(CAC) 4 tk-LUC derived from constructs provided by Dr. J. Bieker (20). p␤-glob-LUC contained 205 bp of murine ␤-globin promoter fused to the luciferase reporter gene, and p(CAC) 4 tk-LUC contained four copies of the murine ␤-globin CACCC site ligated upstream of the thymidine kinase promoter-luciferase reporter gene fusion. All constructs were verified by restriction mapping and DNA sequence analysis.
For electroporation, exponentially growing K562 cells were washed in PBS and 10 7 cells in 200 l of PBS containing 10 mM Hepes, pH 7.5, were electroporated with 2 pmol of the reporter construct at 200 V, 960 microfarads using the Bio-Rad Gene Pulser. MEL cells were grown to 80% confluency, harvested by trypsinization, resuspended in media, and washed twice in PBS. MEL cells (10 7 ) in 500 l of cold PBS containing 10 mM Hepes, pH 7.5, were electroporated with 2 pmol of the reporter construct at 300 V, 960 microfarads. COS-1 cells were grown to 80% confluency and harvested by trypsinization. COS-1 cells (5 ϫ 10 6 ) were resuspended in 500 l of cold buffer containing 20 mM Hepes, pH 7.05, 137 mM NaCl, 5 mM KCl, 0.7 mM Na 2 HPO 4 , 6 mM dextrose and electroporated with 2 pmol of the reporter construct (300 V, 960 micro-farads). All transfections contained 250 g of sheared salmon sperm DNA (Sigma) as a carrier. As an internal control, K562 and COS-1 cells were co-transfected with 5 g of the ␤-galactosidase expression vector, RSV-␤-gal, and MEL cells with 10 g of this vector. Cells were seeded in 60 ϫ 15-mm Petri dishes containing 5 ml of medium and harvested 24 h after transfection. Cell lysates were assayed for luciferase and ␤-galactosidase activity.
Plasmid DNA was prepared by the CsCl/ethidium bromide equilibrium density gradient procedure (21), quantified by spectrophotometry and analyzed by agarose gel electrophoresis to confirm concentration and supercoiling. All transient transfections were performed in quadruplicate with at least three different plasmid DNA preparations.
Reporter Gene Assays-Transfected cells were harvested, washed once in PBS, and treated with 100 l of cell culture lysis reagent (Promega) on ice for 10 min. Cells were then snap frozen, thawed on ice, and centrifuged for 5 min to remove cellular debris. Supernatants were assayed to determine total protein concentration (Bio-Rad protein microassay). Subsequent assays (luciferase and ␤-galactosidase) were performed with 100 g of cell lysate. Luciferase activity was measured using a luciferase assay system (Promega), and measurements were determined in a Berthold model LB9502 luminometer. ␤-Galactosidase activity was measured by the procedure of Herbomel et al. (22) and expressed as (A 420 /g of protein/h) ϫ 100. Luciferase activities were normalized for transfection efficiency using the ␤-galactosidase activity as an internal control, and the data were expressed as "relative luciferase activity." Gel Shift Assays-All nuclear extracts were prepared by the procedure of Partington et al. Single-stranded oligonucleotides were gel-purified, and 100 ng of each labeled with [␥ 32 P]ATP using T 4 polynucleotide kinase. A 3-fold molar excess of the unlabeled complementary oligonucleotide was annealed to the 32 P-labeled oligonucleotide in 100 mM NaCl by incubation at 100°C for 2 min and 70°C for 10 min and then allowing the samples to slowly cool to room temperature. The labeled annealed oligonucleotides were precipitated and washed to remove unincorporated radioactivity and resuspended in 100 l of water. Unlabeled oligonucleotides were also annealed for use in competition assays. Binding reactions used in the detection of GATA-binding proteins contained 5 g of nuclear protein, 2 g of poly(dI-dC) in 15 l of 25 mM Hepes, pH 7.9, containing 60 mM KCl, 7.5% glycerol, 0.1 mM EDTA, 5 mM MgCl 2 , 0.75 mM dithiothreitol, and 2 mM spermidine and incubated on ice for 10 min. Radiolabeled probe (1 ng) was added to the reaction and incubated on ice for a further 30 min. In supershift assays, 2 l of the GATA-1specific monoclonal antibody, N-6 (28) (provided by Dr. G. Partington), was incubated in the binding reaction prior to the addition of probe. Retarded nuclear protein complexes were resolved on a 6% nondenaturing polyacrylamide gel in 0.25 ϫ Tris borate-EDTA buffer at 180 V for 2.5 h at 4°C. The gels were dried and exposed to Kodak X-Omat AR film. For the detection of protein binding to the Ϫ27 GATA site, the binding reaction protocol described by Fong and Emerson (29) was used. Purified recombinant human TBP was obtained from Promega, and poly(dI-dC) was omitted from the binding reaction. In experiments designed to determine the binding affinity constants (K d ) of GATA-1 and TBP, binding reactions and electrophoresis conditions were as described above with a constant amount of radiolabeled oligonucleotide probes and serial dilutions of TBP or a purified GST-GATA-1 zinc finger fusion protein (GST-GATA-1(f)) (30) prepared as described previously (31). Binding reactions used in the detection of CACCC-binding proteins and the supershift assays using polyclonal antibodies to BKLF, EKLF, and Sp1 were performed as described by Crossley et al. (32). The polyclonal antibodies to BKLF and EKLF were generously provided by Dr. M. Crossley. The Sp1 polyclonal antibody, Sp1 (PEP 2) (Santa Cruz Biotechnology, Inc.), was a gift from Dr. M. F. Shannon. Gel shift competition assays were performed with unlabeled competitor oligonucleotides included in the binding reactions.
Transactivation Studies-Transactivation experiments in COS-1 cells were performed with 2 pmol of the reporter construct and 5 g of the murine GATA-1 cDNA expression clone, pXM/GF-1 (provided by Dr. S. H. Orkin) or 10 g of each of the cDNA expression clones, pMT2/ RINFE and pMT2/p18w-1, for NF-E2 (provided by Dr. N. Andrews). For transactivation experiments in K562 cells, 2 pmol of the reporter con-struct and 7.5 g of the EKLF cDNA expression clone, pSG5/EKLF (26) (provided by Dr. J. Bieker), were employed. The vectors pGL2-Basic and ptk-LUC containing the thymidine kinase promoter were included as controls. Cells were harvested 24 h after transfection and 100 g of total protein assayed for luciferase activity. The -fold transactivations were determined following subtraction of the background activity obtained with the appropriate progenitor vectors.

RESULTS
The First 300 bp of Human ALAS2 Promoter Produces Maximal Expression of a Reporter Gene-We previously reported the isolation of genomic clones for human ALAS2 (11), and a partial restriction map of the first 10.3 kb of 5Ј-flanking sequence of the gene is shown in Fig. 1A. To determine regions that contribute to expression, constructs generated with different 5Ј lengths (Ϫ10.3 kb to Ϫ27 bp) and with a common 3Ј end (ϩ28) were fused to the firefly luciferase reporter gene (Fig.  1B). These constructs were transiently transfected into K562, MEL (F4 -12B2), or COS-1 cells, the latter as a nonerythroid control, and luciferase activity was determined in cell lysates. The activity of the longest construct (pALAS-10.3kb-LUC) in each cell line was assigned a value of 100 (Fig. 1B). The promoter expressed strongly in both erythroid cell lines, and maximal activity was seen with 293 bp of promoter (pALASϪ293-LUC). A low level of activity was obtained with the Ϫ27 bp promoter construct (pALASϪ27-LUC). Expression of the constructs was also observed in COS-1 cells and followed a similar pattern to that in erythroid cells except that 1.9 kb of promoter (pALAS-1.9kb-LUC) gave maximal expression (Fig. 1B).
In this study, we have investigated the basis for the strong transcriptional activity of the first 300 bp of promoter. Sequence analysis of this region revealed a clustering of potential binding sites in the first 140 bp, including those for the erythroid-specific transcription factors GATA-1 and NF-E2 as well as CACCC (19) and CCAAT (33) box proteins and the Ets family of proteins (34) (Fig. 1C).
GATA-1 Protein Binds at Ϫ124 and Ϫ100 Sites in the Promoter-Three putative GATA-1 binding sites were identified at Ϫ126/Ϫ121 (on the noncoding strand), Ϫ102/Ϫ97, and Ϫ30/ Ϫ23 (Fig. 1C). The sites centered at Ϫ124 and Ϫ100 were first investigated. The Ϫ124 GATA site (5Ј-AGATAA-3Ј) conforms to the consensus for GATA-1 (35,36) while the Ϫ100 site (5Ј-AGATAC-3Ј) deviates by one nucleotide. Binding of nuclear proteins to these sites was determined using GATAϪ124 and GATAϪ100 probes in gel shift assays with nuclear extracts from K562, MEL, or COS-1 cells and also from COS-1 cells transfected with the murine GATA-1 cDNA expression vector, pXM/GF-1. A ␤-globin GATA-1 consensus sequence (GATAcons) was employed as a control probe (25). A major protein complex was obtained with the GATAϪ124 probe ( Fig. 2A,  lanes 2 and 3) and GATA-cons probe (lanes 12 and 13) using nuclear extracts from K562 and MEL cells. A complex with the same mobility was detected with the GATAϪ100 probe, although the intensity was reduced (lanes 7 and 8). Similar results were also observed with all three probes using nuclear extracts from COS-1 cells expressing recombinant GATA-1 (lanes 5, 10, and 15) but were not detected with nuclear extracts from mock-transfected COS-1 cells (lanes 4 and 9), although a minor band was observed with the GATA-cons probe (lane 14).
To confirm whether the protein complex that bound to the Ϫ124 and Ϫ100 sites in the erythroid cell extracts was indeed GATA-1, gel supershift assays were undertaken with the GATA-1 monoclonal antibody, N-6 (28), and nuclear extracts from either MEL cells or COS-1 cells expressing recombinant GATA-1. The antibody substantially supershifted the major band obtained with the GATAϪ124 probe, the GATA-cons probe (Fig. 2B), and the GATAϪ100 probe (data not shown).
Competition experiments using the GATA-cons probe and nuclear extracts from COS-1 cells expressing recombinant GATA-1 (Fig. 2C) showed that the binding of GATA-1 (lane 1) was effectively and specifically inhibited by a 10-fold molar excess of either GATA-cons in self-competition (lane 2) or GATAϪ124 (lane 5), but a 50-fold molar excess of GATAϪ100 was required for a similar level of inhibition (lane 9). These experiments indicated that the affinity of the Ϫ124 GATA site for protein binding is comparable with the ␤-globin GATA-1 consensus sequence and greater than that of the Ϫ100 GATA site. The bands of higher and lower mobility relative to the major retarded complex that were competed in these experiments most likely represent a dimer of GATA-1 (30, 37) and degraded GATA-1 protein, respectively.
GATA and TBP Bind to the Ϫ27 Site-The sequence (5Ј-GGATAAAT-3Ј) centered at Ϫ27 in the ALAS2 promoter (see Fig. 1C) represents a noncanonical TATA box that exhibits some similarity with a GATA motif. Similar sequences in this location have been identified in the promoters of other eryth-  (lanes 4, 9, and 14), and COS-1 cells expressing recombinant GATA-1 (lanes 5, 10, and 15). Nuclear extract was omitted from lanes 1, 6, and 11. The retarded complex corresponding to GATA binding is indicated by the arrow. B, for supershift assays, the GATA-1 monoclonal antibody, N-6, was added to nuclear extracts from MEL cells (lanes 2 and 6) and COS-1 cells expressing recombinant GATA-1 (lane 4) prior to the addition of the GATAϪ124 and GATA-cons probes. The retarded complex in the absence of antibody and the supershifted complex are indicated by arrows. C, radiolabeled GATA-cons probe was incubated with nuclear extracts from COS-1 cells expressing recombinant GATA-1 (lanes 1-11). The retarded complex (arrow) was competed with a 10-, 50-, and 100-fold molar excess of the GATA-cons in self-competition (lanes 2, 3, and 4), GATAϪ124 (lanes 5, 6, and 7), GATAϪ100 (lanes 8, 9, and 10), and 100-fold molar excess of a nonspecific (NS) competitor (lane 11). roid cell-specific genes such as the chicken ␤-globin (29), rat pyruvate kinase (38), and human glycophorin B (39). Gel shift experiments were used to determine whether the sequence in the ALAS2 promoter binds GATA-1 and TBP (Fig. 3). A major retarded protein complex was observed following incubation of the GATAϪ27 probe with nuclear extracts from COS-1 cells expressing recombinant GATA-1 (Fig. 3A, lane 3) but not with nuclear extracts from mock-transfected COS-1 cells (lane 2). In competition experiments, the complex was abolished using a 50-fold molar excess of either GATAϪ27 as a self-competitor (lane 4) or GATA-cons (lane 6) but not with a canonical TATA box oligonucleotide (lane 5). The complex was identified as GATA-1, since it was supershifted with the GATA-1 monoclonal antibody (data not shown). A retarded complex of similar mobility to GATA-1 was observed with the TATA probe and nuclear extracts from mock-transfected COS-1 cells (lane 8) or COS-1 cells expressing recombinant GATA-1 (lane 9), but mobility of the complex was not affected with the GATA-1 monoclonal antibody (data not shown), and its identity is unknown.
In other experiments, the DNA binding affinity of GATA-1 for the Ϫ27 GATA sequence was compared with that of the ␤-globin GATA consensus site (GATAϪ27G) in gel shift assays using a purified GST-GATA-1(f) fusion protein. An increasing concentration of GST-GATA-1(f) was incubated with a constant amount of each probe, and the extent of DNA binding was determined. An approximately 20 -40-fold difference in the concentration of protein required to give 50% DNA binding was observed, with GATA-1 exhibiting a higher binding affinity for the GATAϪ27G probe compared with the GATAϪ27 probe (data not shown).
Gel shift assays were performed to determine whether the Ϫ27 GATA site could bind TBP. A specific protein complex was detected following incubation of the GATAϪ27 probe with recombinant human TBP (Fig. 3B, lane 2), and a corresponding complex was seen with the canonical TATA box probe (lane 4). These results demonstrate that in addition to GATA-1, the Ϫ27 noncanonical TATA box can bind TBP in vitro. The DNA binding affinity of TBP for the Ϫ27 GATA site was compared with that of the canonical TATA probe in experiments where an increasing concentration of purified TBP (ranging from 0.1 to 100 nM) was incubated with a constant amount of each probe. A specific retarded protein complex was detected with the TATA probe with 1 nM of TBP, but a corresponding complex was not observed with the GATAϪ27 probe over this range of TBP concentrations (data not shown). The data demonstrate that TBP has a weak affinity for the Ϫ27 GATA sequence compared with a consensus TATA box.
Mutational Analysis of the GATA-1 Binding Sites-To investigate the functional contribution of the GATA-1 binding motifs identified in the ALAS2 promoter, these sites were inactivated by mutagenesis in the plasmid construct, pALASϪ293-LUC, and expression was analyzed in K562, MEL, and COS-1 cells (Fig. 4). Mutagenesis of either the Ϫ124 GATA site (pALASϪ293mut1-LUC) or the Ϫ100 GATA site (pALASϪ293mut2-LUC) reduced promoter expression relative to wild type in K562 cells to 64 and 73%, respectively, and this was further reduced to 57% when both sites were mutated (pALASϪ293mut3-LUC) (Fig. 4). In MEL cells, mutagenesis of these sites lowered expression to 36 and 78%, respectively, and a value of 34% was obtained when both sites were mutated. The reason for the greater contribution of the Ϫ124 GATA site in MEL cells compared with K562 cells is not known. These mutations had no effect when tested in COS-1 cells, demonstrating the inactivity of these GATA-1 sites in nonerythroid cells.
To investigate the requirement for the Ϫ27 GATA site in transcription initiation, the sequence was mutated (5Ј-GGATA-AAT-3Ј to 5Ј-GCAGCTGT-3Ј) so that binding of both GATA-1 and TBP was abolished in gel shift assays (data not shown). Expression of the mutated promoter construct, pALASϪ293mut4-LUC, was reduced to 36% of wild type in both K562 and MEL cells (Fig. 4) and to 33% in COS-1 cells. The Ϫ27 GATA site was also converted to a sequence (5Ј-AGGGATAAAT-3Ј to 5Ј-CAT-GATAAG-3Ј), which bound GATA-1 but not TBP in gel shift assays. This mutation (pALASϪ293mut5-LUC) reduced expression in K562 and COS-1 cells to 30 and 40%, respectively, compared with wild type (Fig. 4).
The Ϫ27 binding site was also mutated to a canonical TATA box (5Ј-GGATAAAT-3Ј to 5Ј-GTATAAAT-3Ј), which, in gel shift assays, bound TBP (Fig. 3B, lane 4) but not GATA-1 (Fig. 3A,  lane 9). This mutation (pALASϪ293mut6-LUC) consistently reduced promoter activity in K562 and MEL cells to 81 and 67%, respectively, compared with wild type, but increased expression in COS-1 cells to 132% (Fig. 4). Inactivation of the Ϫ124 and Ϫ100 GATA sites and conversion of the Ϫ27 GATA site to a TATA box (pALASϪ293mut7-LUC) reduced expression in K562 cells to 41% relative to wild type. Hence, for maximal expression in transiently transfected erythroid cells, a noncanonical TATA box is required at the Ϫ27 position that can bind both GATA-1 and TBP in vitro.
GATA-1 Transactivates the Promoter in Nonerythroid Cells-The ability of exogenous GATA-1 to transactivate the plasmid pALASϪ293-LUC was investigated in co-transfection experiments. In K562 cells, transactivation was not observed most likely because of high endogenous GATA-1 levels. However, as seen in Fig. 4, exogenous GATA-1 increased the expression of pALASϪ293-LUC by 4.0-fold in COS-1 cells. Mutagenesis of either the Ϫ124 GATA site or Ϫ100 GATA site reduced the transactivation in COS-1 cells to 2.2-and 2.8-fold, respectively, and this was further reduced to 1.4-fold when both sites were mutated in combination. Conversion of the Ϫ27 site to the canonical TATA box sequence slightly reduced the level of transactivation to 3.2-fold (Fig. 4) and is consistent with the reduced activity of the same construct in K562 and MEL cells. Transactivation by GATA-1 was virtually abolished following the inactivation of the Ϫ124 and Ϫ100 GATA sites and conversion of the Ϫ27 site to the canonical TATA box (Fig. 4).
The sequence (5Ј-GGGTGGGTGGGG-3Ј) located at Ϫ59/Ϫ48 in the ALAS2 promoter contains two putative overlapping CACCC boxes on the noncoding strand (Fig. 1C). The CACCCϪ54 probe encompassing this sequence bound three major protein complexes from MEL cell nuclear extracts in gel shift assays (Fig. 5A, lane 2). Of these, the most rapidly migrating complex was identified as BKLF, since a BKLF antibody (lane 4) partially but specifically inhibited binding, whereas an EKLF antibody (lane 5) or preimmune serum (lane 3) had no effect. The slowest major migrating complex contained Sp1 and probably Sp1-related proteins, since it was supershifted with an antibody to Sp1 (lane 6). However, the Sp1 antibody also partially inhibited binding to the second and third (BKLF) protein complexes. The remaining major retarded complex was unaffected by the antibodies to BKLF or EKLF, and its identity is unknown. The CACCCϪ54 probe was also incubated with nuclear extracts from mock-transfected CV-1 cells (lane 7) and CV-1 cells expressing recombinant murine EKLF (lane 8). A complex of high mobility was observed only with nuclear extracts from cells expressing recombinant EKLF, and this complex was confirmed as EKLF using an antibody to EKLF (lane 11). The slowest migrating complex observed with mock-transfected CV-1 nuclear extracts was confirmed immunologically as Sp1 (lane 12). Together, the data demonstrate that the CACCCϪ54 probe can bind Sp1, BKLF, and EKLF, but the EKLF complex cannot be detected in the MEL cell nuclear extracts employed.
In similar gel shift experiments, Crossley et al. (32) have shown using nuclear extracts from a different MEL cell line, that an EKLF-responsive CACCC box at Ϫ94/Ϫ87 in the promoter of the murine adult ␤-globin gene (26) strongly binds BKLF and Sp1 but only weakly binds EKLF. For comparison, we investigated protein binding by this ␤-globin CACCC box using our MEL cell nuclear extracts. The results were almost identical to those observed with the CACCCϪ54 probe (Fig. 5A, lane 2) with major complexes detected for BKLF and Sp1 but no complex corresponding to EKLF (result not shown). Apparently, there is insufficient EKLF in our MEL cell nuclear extracts for detection by gel shift assays using either CACCC probe. Competition experiments with nuclear extracts from CV-1 cells expressing recombinant EKLF protein indicated that the CACCCϪ54 sequence and the ␤-globin CACCC box bind EKLF with similar affinities (Fig. 5B). EKLF binding to the ␤-globin CACCC box probe (lane 3) was substantially reduced by competition with a 25-fold molar excess of either the ␤-globin CACCC oligonucleotide in self-competition (lane 6) or the CACCCϪ54 oligonucleotide (lane 9).
Mutational Analysis of the CACCC Sequence-The Ϫ54 CACCC sequence was mutated (5Ј-GGGTGGGTGGGG-3Ј to 5Ј-GGCAGCTGGGGG-3Ј) so that both of the constituent overlapping CACCC boxes were destroyed. Expression of this mutant promoter construct (pALASϪ293mut8-LUC) in K562 and MEL cells was reduced to 59 and 46%, respectively, relative to FIG. 4. Effect of mutating the GATA motifs on ALAS2 promoter expression. GATA sites located at Ϫ124, Ϫ100, and Ϫ27 in pALASϪ293-LUC were each mutated to a PvuII site represented by ϫ, and the Ϫ27 GATA site was converted to a canonical TATA box and to a consensus GATA-1 binding site (boxed). These constructs were co-transfected with a ␤-galactosidase expression construct (RSV-␤-gal) and transiently expressed in K562, MEL, and COS-1 cells. The normalized luciferase activities of the mutant constructs are expressed relative to pALASϪ293-LUC which was set at 100%. The data are averages obtained from constructs tested in quadruplicate in at least three experiments and are represented as the mean Ϯ S.D. ND (not determined) corresponds to those constructs not tested in a particular cell line. Constructs were co-transfected with the murine GATA-1 cDNA expression clone, pXM/GF-1, in COS-1 cells, and luciferase activities were determined as described previously. pGL2-Basic was included as a control for transactivation by GATA-1 and assigned a value of 1.0. pALASϪ293-LUC, demonstrating the functional importance of the CACCC sequence (Fig. 6). The effect of mutating both the CACCC sequence and the GATA sites was also investigated. Mutagenesis of the CACCC sequence and the Ϫ124 GATA site (pALASϪ293mut9-LUC), or a triple mutation of the CACCC sequence together with the Ϫ124 GATA and Ϫ100 GATA sites (pALASϪ293mut10-LUC), reduced expression in K562 cells to 44 and 38%, respectively. In COS-1 cells, expression of the promoter construct with only the CACCC sequence mutated (pALASϪ293mut8-LUC) was markedly reduced to 33% relative to wild type, and mutations in the GATA sites did not further lower expression, establishing that promoter activity in these cells is driven predominantly by a CACCC-binding protein, perhaps Sp1 or a Sp1-related protein.
EKLF Transactivates the ALAS2 Promoter-To investigate whether the Ϫ54 CACCC sequence can respond transcription-ally to EKLF, transactivation experiments were performed in K562 cells (43) (Fig. 7). To eliminate the possibility of CACCClike sequences being located upstream in the ALAS2 promoter, the construct pALASϪ124A-LUC, containing 124 bp of ALAS2 promoter, was used in these experiments and was consistently induced 3.1-fold by exogenously expressed EKLF (Fig. 7). Mutagenesis of the Ϫ54 CACCC sequence (pALASϪ124mut-LUC) reduced this to 1.8-fold, indicating that EKLF can function through this site. Transactivation of the ALAS2 promoter by EKLF was compared with p␤-glob-LUC and p(CAC) 4 tk-LUC, which were transactivated 4.0-and 9.5-fold, respectively, by EKLF (Fig. 7). Similar transactivation experiments were performed with exogenously expressed murine BKLF in COS-1 cells. However, BKLF failed to transactivate the construct pALASϪ124A-LUC through the Ϫ54 CACCC sequence (data not shown).
Mutational Analysis of the NF-E2-like Sequence-Partially overlapping the CACCC sequence is an NF-E2-like sequence at Ϫ49/Ϫ39 (Fig. 1C) with a 9/11 match to the consensus NF-E2 binding motif (44). When this sequence was mutated (5Ј-GGCTGAGTCAG-3Ј to 5Ј-GGCAGCTGCAG-3Ј) in pALASϪ293mut11-LUC (Fig. 6), expression in K562 and MEL cells was unaffected. In transactivation experiments in COS-1 cells, overexpression of recombinant murine NF-E2 protein (erythroid p45 (44) and ubiquitous p18 (45) subunits) failed to increase expression of pALASϪ293-LUC (data not shown). These experiments established that the NF-E2-like sequence is inactive, a finding that is in agreement with gel shift competition studies performed by Andrews et al. (44). DISCUSSION Deletion analysis of the 5Ј-flanking region from the human ALAS2 gene established that the first 300 bp of promoter sequence directs strong transient expression in erythroid cells. This region contained several putative transcription factor binding sites (see Fig. 1C) clustered within the first 140 bp, notably GATA and CACCC box motifs, which are a feature of the regulatory regions of many other erythroid-specific genes (38, 39, 46 -49). Two potential GATA-1 binding sites were identified, centered at Ϫ124 and Ϫ100 with an inverted palindromic arrangement. These sites were functionally active and shown to bind GATA-1 protein in erythroid cell nuclear extracts. Transactivation assays with exogenously expressed GATA-1 in nonerythroid cells confirmed the response of each of these sites to GATA-1. The contribution of the Ϫ124 site to ALAS2 expression was moderately greater than that of the Ϫ100 site, consistent with the deviation of the Ϫ100 GATA site by a single nucleotide from the consensus sequence (35,36).
The ALAS2 promoter lacks a canonical TATA box, but located at Ϫ30/Ϫ23 there is the sequence 5Ј-GGATAAAT-3Ј, which binds TBP or GATA-1 in vitro. Protein binding reactions performed with purified GST-GATA-1(f) and TBP indicated that the affinities of these proteins for this site were considerably reduced compared with consensus sites for these proteins. Conversion of the Ϫ30/Ϫ23 sequence to a consensus GATA-1 binding site, which binds GATA-1 in vitro but not TBP, significantly reduced transient expression in erythroid cells to 30% of the wild type. This finding demonstrated the importance of a functional TATA box and presumably the requirement of the general transcription factor, TFIID, in the transcriptional initiation of this gene. Conversion of the Ϫ30/Ϫ23 sequence to a canonical TATA box, which binds TBP in vitro but not GATA-1, consistently reduced transient expression in erythroid cells to 70 -80% of the wild type, also supporting a role for GATA-1 in transcriptional initiation. A similar role for GATA-1 has been proposed for the erythroid-specific human glycophorin B (39) and chicken ␤-globin (29) gene promoters, which also possess noncanonical TATA boxes. For the chicken ␤-globin promoter, there is evidence that GATA-1 bound at the Ϫ30-position prevents the assembly of a repressive nucleosome (50) and imparts erythroid cell specificity through the interaction with another GATA-1 molecule bound to the 3Ј-enhancer (29). Perhaps, by analogy, GATA-1 bound to the Ϫ27 site of the ALAS2 promoter may facilitate transcriptional initiation in vivo by inhibiting nucleosome formation. The mechanism by which such bound GATA-1 could be replaced by TFIID in vivo is not known but an "initiator-like" element (51) located at ϩ7/ϩ12 (5Ј-TCATTC-3Ј) (see Fig. 1C) may play a role in this process.
In the ALAS2 promoter, a CACCC sequence at Ϫ59/Ϫ48 was identified, which consists of two overlapping CACCC boxes on the noncoding strand (Fig. 1C). This sequence was shown to be functionally important for erythroid cell expression, although the contributions of the two overlapping CACCC sites remain to be elucidated. While transcriptional synergism between GATA-1-and CACCC-binding proteins has been reported (52)(53)(54), mutational analysis of the GATA and CACCC sites in the present study did not provide evidence for a cooperative interaction in the ALAS2 promoter.
CACCC boxes are bound by several proteins in vitro, including members of the Krü ppel family of transcription factors, Sp1 (40), EKLF (26), and BKLF (32). While the in vivo function of these proteins has been difficult to define, a specific role for EKLF in adult ␤-globin gene transcription has been established (20,43,(55)(56)(57)(58)(59)(60), and an EKLF-responsive CACCC box has been identified at Ϫ94/Ϫ87 in the murine adult ␤-globin gene promoter (20). Gel supershift assays demonstrated that the ALAS2 CACCC sequence mimics this ␤-globin CACCC box (32) and is able to bind not only EKLF but also Sp1 and BKLF. Since competition experiments indicated that the two CACCC sites bound EKLF with a similar affinity, this raised the possibility that EKLF may also regulate expression of the ALAS2 gene. Transactivation experiments provided support for this, with the ALAS2 promoter being consistently transactivated approximately 3-fold by exogenous EKLF (comparable with the 4-fold level observed with the ␤-globin promoter), and mutagenesis of the Ϫ54 CACCC sequence significantly inhibited this transactivation. A direct role for EKLF on expression of ALAS2 in vivo is now being investigated in EKLF Ϫ / Ϫ mice (56).
In addition to GATA and CACCC box sequences, other possible binding sites for transcription factors were identified in the ALAS2 promoter (see Fig. 1C). An Ets-like sequence (22) located between the Ϫ124 and Ϫ100 GATA sites was examined, but mutagenesis of this site did not alter promoter expression in erythroid cells (data not shown). An NF-E2 site, with a mismatch at both extremities of the 11-bp consensus sequence, 5Ј-(T/C)GCTGA(G/C)TCA(C/T)-3Ј (44), partially overlapped the CACCC sequence and was also found to be inactive in erythroid cells. A putative CCAAT box located at Ϫ90/Ϫ84 in the ALAS2 promoter (see Fig. 1C) is identical to the functional CCAAT box located in the human ␤-globin promoter (46) but has not been investigated in this study.
There is evidence that globin enhancers activate gene tran- FIG. 6. Effect of mutating the ؊54 CACCC and ؊44 NF-E2-like sequences on ALAS2 promoter expression. The Ϫ54 CACCC sequence and the Ϫ44 NF-E2-like sequence in pALASϪ293-LUC were each mutated to a PvuII site. To examine the possible interaction of the CACCC sequence with GATA sites, the Ϫ124 GATA and Ϫ100 GATA sites were mutated in combination with the Ϫ54 CACCC sequence. Mutated sites are represented by ϫ. Constructs were transiently expressed in K562, MEL, and COS-1 cells and co-transfected with a ␤-galactosidase expression construct (RSV-␤-gal). The luciferase activities were standardized relative to ␤-galactosidase activity as an internal control for transfection efficiency and expressed relative to pALASϪ293-LUC (set at 100%) as described in the legend to Fig. 4.   FIG. 7. Transactivation of the ALAS2 promoter by EKLF. The constructs p(CAC) 4 tk-LUC, ptk-LUC, p␤-glob-LUC, pALASϪ124A-LUC, pALASϪ124mut-LUC, and pGL2-Basic were co-transfected with the EKLF cDNA expression clone, pSG5/EKLF, in K562 cells, and luciferase activities were determined. The mutated Ϫ54 CACCC sequence is represented by ϫ. The data are averages obtained from constructs tested in quadruplicate in at least six experiments and are represented as the mean Ϯ S.D. The plasmids ptk-LUC and pGL2-Basic were included as controls for transactivation by EKLF and assigned a value of 1.0, and transactivation of the plasmid constructs by EKLF was corrected for background. scription by increasing the number of expressing cells rather than the level of transcription in expressing cells (61)(62)(63). In the present study, it cannot be distinguished whether the reduced expression with mutated ALAS2 promoter/reporter constructs represents a decrease in the proportion of transfected cells expressing the reporter gene or from a decrease in promoter activity. Further experiments will be required to address this issue.
ALAS2 promoter deletion constructs from Ϫ10.3 kb to Ϫ293 bp expressed efficiently in COS-1 cells and similar observations have been made for the promoters of other erythroid cellspecific genes (39,64,65). Expression of the ALAS2 promoter in these cells most likely reflects an inadequate assembly of nucleosomes on transiently transfected constructs, and our studies show that the CACCC sequence in the promoter is a major contributor to this expression, presumably through the action of Sp1. Tissue-specific expression of the ALAS2 gene in vivo would then reflect the absence of repressive nucleosomes, e.g. through the binding of GATA-1 to the Ϫ27 site as proposed earlier. Erythroid cell-specific enhancers have been identified in the flanking regions of several erythroid genes (29,64,66), and such sequences could contribute to tissue-specific expression of the ALAS2 gene.
All of the enzymes of the heme biosynthetic pathway have now been cloned (1,67). The large requirement for heme during erythropoiesis in contrast to nonerythroid cells may have necessitated the evolution of distinct transcriptional regulatory processes for expression in erythroid cells. To highlight this, there are two genes encoding the rate-limiting ALAS enzyme, the housekeeping gene and the erythroid gene, and these are located on different chromosomes (4 -6). As expected, the promoter architecture of the housekeeping gene is different from that of the erythroid gene and contains multiple binding sites for the ubiquitous transcription factors, Sp1 and NRF-1 (68). For the other enzymes of the heme pathway, there is only one structural gene, and these have either a composite promoter, which contains binding sites for both ubiquitous and erythroidspecific transcription factors (1, 67), or, alternatively, two separate promoters, one with a housekeeping function and the other that is erythroid cell-specific (69,70). Functional sites for GATA-1, NF-E2, and CACCC box-binding proteins have been characterized in the erythroid promoter for human porphobilinogen deaminase (47), although the NF-E2 site is absent from the corresponding murine porphobilinogen deaminase promoter (71). Binding sites for GATA-1, NF-E2, and CACCC box-binding proteins have also been identified in the human ferrochelatase promoter (72). In contrast, the chicken ALAS2 promoter contains multiple binding sites for Sp1 (12). These studies, together with the information on globin gene expression (19), confirm that there is likely to be only a small number of erythroid cell-specific factors that act in a combinatorial fashion to ensure the coordinated regulation of heme and globin synthesis during erythropoiesis.