If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
National Foundation for Infectious Diseases Fellow. To whom correspondence should be addressed: Rm. 2115, MR 4 Bldg., 300 Park Place, University of Virginia Health Sciences Center, Charlottesville, VA 22908. Tel.:/Fax: 804-924-0075;
* This work was supported by National Institutes of Health Grants R01-AI 37941 (to W. A. P.) and K08-AI 01453 (to U. S.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Entamoeba histolytica, an enteric protozoa, is the third leading parasitic cause of death worldwide. Investigation of the transcriptional machinery of this eukaryotic pathogen has revealed an unusual core promoter structure that consists of nonconsensus TATA and initiator regions and a novel third conserved core promoter sequence, the GAAC element. Mutation of this region in the hgl5 promoter decreases reporter gene expression and alters the transcription start site. Using positional analysis of this element, we have now demonstrated that it is able to direct a new transcription start site, 2–7 bases downstream of itself, independent of TATA and Inr regions. The GAAC region was also shown to control the rate of transcription via nuclear run on analysis and an amebic nuclear protein was demonstrated to specifically interact with this sequence. This is the first description in the eukaryotic literature of a third conserved core promoter element, distinct from TATA or initiator regions, that is able to direct a transcription start site. We have formulated two models for the role of the GAAC region: (i) the GAAC-binding protein is a part of the TFIID complex and (ii) the GAAC-binding protein functions to “tether” TATA-binding protein to the TATA box.
Entamoeba histolytica is a single cell eukaryote that causes invasive amebic colitis and liver abscess. Infection with this organism is an important contribution to morbidity and mortality in developing countries, and worldwide it is the third leading parasitic cause of death. During its life cycle E. histolyticaundergoes developmental changes such as transformation from the cyst to trophozoite and adaptation from an anaerobic to aerobic environment upon invasion. How E. histolytica regulates these events is not understood, although regulation of transcription is likely to be an important mechanism of this control. Recently two papers have described transcriptional control of a drug resistance gene in E. histolytica (
), thus demonstrating a relationship between pathogenesis and regulation of transcription.
At a molecular level, little is known about the control of gene expression in this organism. As an early diverging member of the eukaryotic tree, E. histolytica has many unusual characteristics with regard to gene organization. These include a genome that is AT-rich (67% within coding regions and 78% overall) (
). A putative E. histolytica TBP has been reported (GenBankTM accession number Z48307) that has significant sequence divergence from the TBP of Drosophila melanogaster,Caenorhabditis elegans, and Plasmodium falciparum (
U. Singh, J. Rogers, and W. A. Petri, Jr., unpublished results.
2U. Singh, J. Rogers, and W. A. Petri, Jr., unpublished results.
It has been shown that amebic promoter sequences do not function in a mammalian system and that viral promoters (cytomegalovirus, human immunodeficiency virus long terminal repeat, the simian virus 40) and promoters from other systems (Dictyostelium) are nonfunctional in amebic trophozoites (
). Thus, it would appear that species-specific transcription factors may be utilized in amebic gene expression.
The core promoter region in metazoans is the target of a variety of regulatory proteins that work in concert to direct the complex mechanisms of transcriptional control. Transcription of mRNA relies on the assembly of RNA polymerase II and a variety of other transcription factors (TFIID, TFIIA, TFIIB, TFIIE, TFIIF, and TFIIH) into a stable and functional preinitiation complex (
). The preinitiation complex may assemble in a sequential manner on the DNA sequence of the core promoter, or a “holoenzyme” complex may form, which then binds specifically to the core promoter region (
). It has a variable location between the TATA and Inr sequences, a characteristic not defined previously for core promoter transcription elements in metazoans. Mutation of this region in the hgl5 gene promoter had a greater effect on gene expression and selection of the site of transcription initiation than mutation of either the TATA or Inr regions (
). These characteristics of the GAAC region would seem to usurp the dominant role of the TATA element in transcriptional control in E. histolytica.
Our goal was to determine how the GAAC element in the hgl5 gene of E. histolytica regulates the rate and site of transcription initiation. This was undertaken by performing positional analysis of the GAAC region. We determined that it affected the rate of transcription, did not function as an enhancer element, and was able to direct a site of transcription initiation independent of a TATA or Inr element. An amebic nuclear protein was demonstrated to specifically interact with this DNA sequence as shown by EMSA. Based on the data, we have formulated a model for the role of the GAAC region in transcriptional control in E. histolytica.
MATERIALS AND METHODS
Cultivation of E. histolytica and Stable Transfection
E. histolytica strain HM-1:IMSS trophozoites were cultured in TYI-S-33 medium containing penicillin (100 units/ml) (Life Technologies, Inc.) and streptomycin (100 μg/ml) (Life Technologies, Inc.) (
). The trophozoites were maintained at 24 μg/ml G418, and all Northern blot and primer extension analyses were performed on RNA isolated from stably transfected parasites maintained at this drug concentration.
Positional analysis of the GAAC region was undertaken on the plasmids pTP.4i and pTP.GAAC-CLA (
). Construction of the pTP.4i plasmid was described earlier and consists of a 272-bp hgl5 upstream region fused to a luciferase reporter gene and the 5′ actin region fused to the neomycin drug selection gene (
). In the plasmid pTP.GAAC-CLA the 5′ core promoter region of the hgl5 gene has the GAAC region mutated to a ClaI site (GAACT to CGATT). To introduce a GAAC region in the upstream location, we used a two-round PCR technique (
). Previous analysis of the 5′-noncoding region of thehgl5 promoter had revealed that the region 20 bases upstream of the TATA sequence could be replaced by an AT-rich sequence without affecting gene expression (
). This region was replaced in both pTP.4i and pTP.GAAC-CLA plasmids using a primer that introduced an intact GAAC region 22 bases upstream of the beginning of the TATA region. The primer used to generate this was 5′-AAAAGAAGGAAAGGAATGAACTAGTAATAATAGGAAAGG-3′, which introduced an SpeI site (underlined) downstream of the GAAC region (bold) that was used for diagnostic digests of possible clones. The first round of PCR used the above primer with the primer 5′-CTTTCTTTATGTTTTTGGCG-3′, which hybridized with the coding region of luciferase (bases 1727–1746 of pGEM-luc, Promega). This resulted in a 150-bp fragment that was then used as a primer for the second round of PCR using the primer 5′-CTACTGAAGCTTAGTAAAGAATAGTATTGA-3′ (containing aHindIII restriction site shown underlined for cloning) that hybridized at the 5′-end of the p-TP.4i plasmid. Using the same approach, the primer 5′-AAAAGAAGGAAAGGTGATCAAGTAAAATAATAGGAAAGG-3′ was used to clone an intact but inverse orientation GAAC (bold) with aBclI restriction site (underlined) into the coding strand at a region 22 bases upstream of the TATA region. Similarly a primer was designed to generate an intact GAAC region 10 bases downstream of the initiator region (5′-GACAAAGATATGAAAAATGAACTATGGATCCAAATG-3′). This resulted in a change in the amino acid sequence of thehgl5 fusion region between the start codons of thehgl5 and luciferase genes from that of wild type, but did not generate any nonsense or stop codons. The wild type amino acid sequence is (Lys-Leu-Leu-Leu-Trp-Ile-Glu) and was changed to (Lys-Asn-Glu-Leu-Trp-Ile-Glu). Each primer had at least 14 bases of sequence homology upstream and downstream of the mutations to allow for hybridization with the backbone. These colonies were screened by restriction enzyme analysis where appropriate or by sequence analysis. All constructs used in our experiments were sequenced in their entirety to rule out PCR-induced mutations.
Northern Blot Analysis
Northern blot analysis was done as described previously (
) with the following modifications: 13 μg of RNA was utilized for each construct, and the blot was probed with radiolabeled denatured DNA probes. The DNA probes consisted of the coding regions of luciferase and neomycin, which were extracted by digestion of the pTP-Luc plasmid (
). Briefly, stably transfected trophozoites, maintained in TYI-S-33 medium supplemented with 24 μg/ml G418, were chilled on ice and the number of cells counted and harvested. These were then centrifuged at 200 × g for 5 min, washed once in phosphate-buffered saline (pH 7.5), and lysed in 100 μl of lysis buffer with the addition of protease inhibitors E64-C and leupeptin. Lysates were assayed following serial dilution to 10−3 or 10−4. Samples were prepared in triplicate and assayed at room temperature with a Turner luminometer model TD-20E (Promega). Luciferase activity per cell was calculated as a measure of reporter gene expression.
Nuclear Run On Analysis
Nuclei were harvested from 5 × 107 logarithmically growing trophozoites stably transfected with the plasmid of interest and stored at −70 °C (
). RNA extraction was performed using the guanidinium isothiocyanate method (RNagen kit, Promega). Approximately 1.2 pmol of DNA probes (luciferase, neomycin, and 272-bp promoter of thehgl5 gene) were purified, denatured, and dot-blotted onto Zeta-Probe GT genomic blotting membrane (Bio-Rad). The membrane was incubated with the prehybridization solution at 65 °C for 20 min, denatured RNA probe was added, and the mixture was incubated overnight at 65 °C. The membrane was washed at 65 °C according to the manufacturer's instructions and exposed on a PhosphorImager (Molecular Dynamics).
Primer Extension Analysis
Primer extension analysis was performed as described previously using polyadenylated mRNA from stably transfected amebae (
). Primer extension was performed using the Superscript II RNase H− reverse transcriptase system (Life Technologies, Inc.) and run next to the appropriate sequencing ladder on a 6% polyacrylamide gel. Sequencing was performed using the circumvent thermal cycle sequencing system (New England Biolabs) using the α-35S-labeled dATP incorporation method. To rule out contaminating or nonspecific extension products, all mRNA samples were treated in DNase buffer (50 mm Tris-HCl (pH 7.4), 1 mm EDTA (pH 8.0), 10 mm MgCl2, 1 mm 1,4-dithiothreitol) with 10 units of RNase-free DNase at 37 °C for 60 min followed by overnight ethanol precipitation prior to primer extension experiments.
Nuclear Extract Preparation and Electrophoretic Mobility Shift Assay
Nuclear extracts were prepared by the methods described previously (
). The double-stranded oligonucleotide used for the electrophoretic mobility shift assays was 5′-AAGACAATGAACTAGAATG-3′ with an intact GAAC region (bold) but with a mutated and truncated Inr region (italics). The oligonucleotide used for the gel shift assays did not contain a TATA region. An E. histolytica hgl5 promoter sequence (5′-AATTCTGTTATATGATCATTTGGTTTGTAATTACAGCTGG-3′) and an oligonucleotide (5′-AAGACCTACGATAAGAATG-3′) with a mutated GAAC region (bold) were used as double-stranded competitors for the gel shift assay. The probe was purified by a polyacrylamide gel extraction procedure (
). Modifications on this method included the incubation of the polyacrylamide gel section containing the probe at 37 °C overnight in elution buffer (TE (10 mm Tris, 1 mm EDTA, pH 8.0) with 100 mm NaCl) prior to centrifugation for 30 min at 10,000 × g. The supernatant was then saved and the pellet washed with an additional 400 μl of elution buffer and re-centrifuged. The two supernatants were combined, the pellet discarded, and the filtered probe was ethanol-precipitated at −70 °C. One pmol of this purified probe was labeled with [α-32P]dATP using the Klenow fragment of DNA polymerase I and purified from unincorporated nucleotide by a NucTrap column (Stratagene, La Jolla, CA).
The protein-DNA interaction occurred in band shift buffer (10 mm Tris-HCl (pH 7.9), 50 mm NaCl, 1 mm EDTA, 0.05% non-fat milk powder (Carnation), 3% glycerol, 0.05 mg of bromphenol blue). To this reaction mixture 0.5 μg of salmon sperm, 25 fmol of DNA probe, and 1.2 μg of nuclear extract were added (
). Since the GAAC element has been shown to have a variable location between the TATA and Inr regions, we hypothesized that it may function in a position-independent manner to control the rate of gene expression. To test this hypothesis, we performed positional analysis of the GAAC region in the hgl5 gene promoter linked to a luciferase reporter gene. The GAAC region was placed upstream of the TATA region by 22 bases. Alternatively GAAC was placed downstream of the Inr by 10 bases or one helical turn (Fig.1). These plasmids were electroporated into trophozoites and selected for stable transfectants by drug (G418) selection.
Northern blots were performed with total RNA harvested from the trophozoites (Fig. 2). Luciferase assays were performed on cells stably transfected with various constructs (Fig. 2). Mutation of the GAAC region in the core promoter resulted in a decrease in luciferase message to 38% of wild type, as reported previously (Fig. 2A) (
). Insertion of a wild type GAAC sequence upstream of a mutated core promoter, in either the native or inverse orientation, did not reconstitute luciferase message to wild type levels (Fig. 2, C and D) and in fact resulted in luciferase enzyme activity that was 17 and 7.4% of wild type. Insertion of a wild type GAAC sequence 22 bases upstream of a wild type core promoter also resulted in a marked diminution of reporter gene message levels and luciferase expression was 43% of wild type (Fig. 2B). Insertion of a wild type GAAC sequence downstream of a wild type core promoter resulted in moderate enhancement of message accumulation and luciferase levels were 284% of wild type (Fig. 2E). Insertion of a GAAC region downstream of a mutated core promoter resulted in decreased luciferase message and enzyme levels were 44.5% of wild type (Fig. 2F). Thus, the GAAC element was not able to regulate the level of gene expression in a position-independent manner. In addition, it controls the transcription start site; therefore this sequence does not meet the classical definition of an enhancer element.
Mutation of the GAAC Region Resulted in a Decreased Rate of Transcription as Determined by Nuclear Run On Analysis
mRNA in E. histolytica usually has a short 5′-untranslated region, and transcripts with long 5′-untranslated regions in this organism may be unstable. Thus Northern blot and primer extension data on mRNA may not accurately reflect the rate of transcription initiation from a particular site. Nuclear run on assays were done on selected constructs to determine the role of this region in regulating the rate of transcription from a particular site. Fig.3 shows the data from nuclear run on assays done on nuclei harvested from trophozoites stably transfected with the wild type construct, a core promoter with a mutated GAAC region (Fig. 3, lane A), and core promoter with a mutated GAAC region but an upstream wild type GAAC (Fig. 3, lane C). Newly transcribed RNA obtained from the run on assays was hybridized with probes for neomycin, luciferase, and the 5′-noncoding region of the hgl5 gene. Neomycin was utilized as a control message, and all exposures were developed to obtain equivalent neomycin signal. In the wild type construct, the luciferase message was evident; however, there was no measurable signal at the 5′-noncoding region of the hgl5 gene. In the construct with a core promoter with a mutated GAAC region, the luciferase message was not detected. In a construct with a core promoter with a mutated GAAC region and a GAAC region inserted in the upstream region (Fig. 3C), RNA from both the luciferase and 5′-untranslated region regions was detected. The magnitude of these signals relative to the neomycin signal was roughly equivalent. This distribution of transcript abundance was reflected in the intensity of bands from the two transcription start sites seen in the primer extension analysis of this construct (Fig.4C), indicating that the RNA generated from the various start sites was stable and that primer extension analysis is an accurate method for quantitating the abundance of message from a particular transcription initiation site. These data confirmed that the GAAC element controlled gene expression by regulating the rate of transcription.
The GAAC Sequence Can Function in a Position-independent Manner to Direct a Site of Transcription Initiation
The Northern blot data showed that addition of a wild type GAAC region upstream of a core promoter with a mutated GAAC element did not reconstitute reporter gene expression to wild type levels. Previous data has shown that mutation of the GAAC region results in a major transcription start site at the Inr and multiple minor (4% compared with the Inr start site) transcription start sites up to −90 (Fig. 4A). We therefore wished to determine the effect of changes in position of wild type GAAC on transcription start site.
Insertion of a wild type GAAC region upstream of a wild type or mutated core promoter resulted in the appearance of new transcription start sites (Fig. 4, B and C). In the wild type core promoter construct (Fig. 4B), transcription initiated 2–7 bases downstream of the new GAAC element and at the Inr region. The new start site downstream of GAAC was quantitated to be 19% of the Inr start site in that construct. When a GAAC region was placed upstream of a mutated core promoter GAAC region (Fig. 4C) a new transcription start site appeared 3–4 bases downstream of this element, which was relatively equal in intensity (80%) to the wild type start site in that construct. The placement of an inverse GAAC region upstream of a mutated core promoter resulted in a new transcription start site 5 bases downstream of the GAAC region, which was 56% of the wild type start site in that construct (Fig.4D). Analysis of downstream GAAC regions showed that placement of a wild type GAAC region downstream of a wild type or mutated core promoter resulted in the generation of a new start site 2–3 bases downstream of the GAAC sequence (Fig. 4, E andF). Similar to earlier results, in the context of a mutated core promoter (Fig. 4F), the new start sites generated by the GAAC region appeared to be of equal intensity to that of wild type (113% compared with the Inr site), whereas in the context of a wild type core promoter (Fig. 4E), the wild type start site at the Inr was dominant (new start site was 49% of the wild type site). Thus, this data indicated that the GAAC region was capable of controlling a transcription start site independent of TATA and Inr. In eukaryotes this function had previously been assigned exclusively to the TATA and Inr regions of the core promoter.
An E. histolytica Nuclear Protein Binds to the GAAC Region of the hgl5 Gene in a Sequence-specific Manner
To determine whether the GAAC sequence functioned by interacting with a sequence specific amebic nuclear protein, we utilized EMSA analysis. Amebic nuclear extracts were hybridized with a double-stranded oligonucleotide with the GAAC sequence to identify DNA-protein interactions. EMSA was performed with a probe that contained no TATA region, a mutated and truncated Inr region, and a wild type GAAC region from the hgl5 gene. This oligonucleotide was constructed to prevent DNA-protein interactions with the other core promoter elements, the TATA and Inr.
Incubation of the probe with crude amebic nuclear extract revealed two bands; the lane with probe alone and no amebic nuclear protein had no bands (Fig. 5). Competition experiments were done to determine the specificity of the DNA-protein interaction. The lower band (small arrow) represents a nonspecific DNA-protein interaction as its intensity is not altered by self, unrelated, or mutant competition. Since the oligonucleotide for the EMSA analysis did not contain functional TATA or Inr regions, any specific DNA-protein interaction seen in the EMSA can be ascribed to the GAAC region. Competition assays with self cold unlabeled probe at 2 × and 4 × revealed that a band (Fig. 5, large arrow) was competed by the cold competitor. An unlabeled, unrelated amebic promoter sequence and an oligonucleotide with a mutated GAAC region did not compete this specific band to the same degree. This demonstrated specificity of the DNA-nuclear protein interaction for the oligonucleotide with the GAAC sequence and indicated that the GAAC region specifically recognizes an amebic nuclear protein.
The major conclusion from this study is that the third core promoter element GAAC (GAACT) in the hgl5 gene of E. histolytica (i) independently directs a new site of transcription initiation, (ii) controls the rate of transcription initiation, and (iii) interacts in a sequence-specific manner with an amebic nuclear protein (GAAC binding protein(s), GBP(s)). The role of the GAAC region in the hgl5 gene was determined by reporter gene assays, Northern blot analysis, and nuclear run on assays, all of which indicated that the GAAC region controls the rate of transcription.
The GAAC element of the hgl5 gene in E. histolytica is able to control a transcription start site independent of TATA and Inr regions. We demonstrated that positional manipulation of the GAAC region, separated from the TATA and Inr core promoter, resulted in new transcription start sites 2–7 bases downstream of itself. This result occurred consistently with upstream and downstream positioning of the GAAC element and regardless of whether the core promoter region contained a wild type or mutated GAAC sequence. In the context of a wild type core promoter (i.e.wild type TATA, GAAC, and Inr regions) the dominant start site was always in the Inr region regardless of whether a GAAC region was inserted in an upstream or downstream location. However, when the core promoter contained a mutated GAAC sequence, the insertion of a wild type GAAC region in the upstream or downstream location resulted in new transcription start sites that were of equal intensity to that of the wild type Inr site. Thus, in the wild type promoter the interaction between the three regions and the proteins that bind to them are dominant in controlling the transcription initiation site. However, in the context of a mutated GAAC region, this dominance is lost, and new transcription initiation sites (directed by the GAAC region) occur. The GAAC sequence was also able to direct a new site of transcription initiation in an inverse orientation, most likely through the creation of a cryptic GAAC site. The identification of a third core promoter element that controls the site of transcription initiation is unprecedented in eukaryotes.
We hypothesized that the GAAC sequence functioned to control the site of transcription initiation via a DNA-protein interaction. The results of the EMSA analysis revealed that an amebic nuclear protein(s), GBP, interacted specifically with the GAAC region. Competition assays with an unrelated hgl5 promoter sequence and an oligonucleotide with a mutated GAAC region pointed to the specificity of this interaction. In the analysis of the EMSA results, it is important to realize that the GAAC-GBP interaction apparently was not dependent on DNA-TBP or DNA-Inr binding protein interactions, since the EMSA probe did not contain functional TATA or Inr regions. The implication is, therefore, that although TBP may require GBP for DNA binding, the GBP does not require TBP or Inr binding protein(s) for accurate and specific DNA binding. Once the amebic GBP and TBP have been isolated, the specific DNA-protein and protein-protein interactions can be characterized in greater detail.
The requirement for a gene or a family of genes to have three core promoter elements is unclear: why would transcription of protein encoding genes in E. histolytica be dependent on a third regulatory region? Perhaps the E. histolytica TFIID complex has multiple DNA binding regions composed of TBP, GBP, and Inr binding proteins. A variety of pre-assembled TFIID complexes could exist, containing some or all of the core promoter-binding proteins. These different TFIID complexes could differentially regulate a variety of core promoters containing all three or only one or two of these regulatory regions.
A second model is based on the fact that TBP in vitro is able to bind to multiple AT-rich sequences (
). In an AT-rich organism such as E. histolytica a mechanism may have developed in which a factor such as GBP localizes TBP to the promoter. Thus a model can be hypothesized in which transcription in E. histolytica hgl5 genes may be dependent on protein-protein interactions in which GBP functions to “tether” or localize TBP/TFIID to the core promoter.
Precedence for both these models can be found in the metazoan literature. It has been shown that TFIID can bind multiple regions of the core promoter, including the TATA and Inr regions (
). Both models provide for a basal complex that has multiple sites to regulate transcription in response to cellular and environmental stimuli.
In conclusion, we have described, for the first time in the metazoan transcription literature, a third core promoter region, GAAC, which is independently able to direct a site of transcription initiation. This sequence from the hgl5 gene of E. histolytica has been shown to interact in a sequence specific manner with an amebic nuclear protein (GBP). The presence of this regulatory core promoter region raises intriguing questions regarding transcriptional control in this primitive protozoan parasite. It is important to consider the possible presence of similar yet to be identified regulatory proteins in other eukaryotes. The isolation of the GAAC-binding protein and characterization of its role in transcriptional control are the next steps in elucidating the role of GBP in the transcriptional machinery of E. histolytica.
We thank Barbara Mann, David Auble, and William A. Petri, Jr. for excellent discussions and scientific input.