Identification of Two E. histolytica Sequence-Specific URE4 Enhancer-Binding Proteins with Homology to the RNA-Binding Motif RRM

nine basepair sequence-specific URE4-binding E. electrophoretic mobility shift assay and crosslinking We here the purification and cloning of two sequence-specific URE4-binding proteins, EhEBP1 and EhEBP2.

Identification of two E. histolytica enhancer binding proteins 2 SUMMARY In order to study transcriptional regulation in the lower branching eukaryote Entamoeba histolytica, we have identified two sequence-specific DNAbinding proteins that recognize the upstream regulatory element URE4 1 , an enhancer that regulates expression of the Gal/GalNAc lectin heavy subunit gene hgl5. A chromatographic purification of E. histolytica nuclear extracts by gel filtration, cation exchange, and sequence-specific DNA affinity chromatography led to a 700-fold increase in URE4-binding activity and the appearance of two dominant protein species with molecular masses of 28 and 18 kD. These proteins, termed E. histolytica enhancer-binding proteins 1 and 2 (EhEBP1 and EhEBP2 2 ), were sequenced by tandem mass spectroscopy and their corresponding cDNA clones identified. Recombinant EhEBP1 and EhEBP2 were able to bind double-stranded oligonucleotides bearing the URE4 motif in a sequence-specific manner, and antibodies raised against EhEBP1 were able to interfere with the formation of URE4-protein complexes in crude nuclear extracts. Overexpression of EhEBP1 in E. histolytica trophozoites resulted in a seven-fold drop in promoter activity in transiently transfected reporter gene constructs when the URE4 motif was present, confirming its ability to specifically recognize the URE4 motif and suggesting that additional cofactors may be Trichomonas vaginalis is an earlier branching eukaryote than E. histolytica.
Its core promoter contains an Inr element similar to the metazoan Inr motif, but a TATA element has not been identified (3). The T. vaginalis RNA polymerase that transcribes protein-coding genes is α-amanitin resistant, and cloning of the polymerase gene demonstrated a lack of conserved amino acids in the region thought to bind this drug (4). Giardia lamblia is another early branching eukaryote about which little is known. Three conserved promoter elements have been identified by comparing 5' flanking sequences, none of which share Identification of two E. histolytica enhancer binding proteins 5 homology with metazoan core promoter motifs (5). No sequence-specific DNAbinding proteins have been identified in either of these two protists.
Entamoeba histolytica is another protozoan parasite whose mechanisms of transcriptional regulation are beginning to be investigated. Causing amebic colitis and amebic liver abscesses, E. histolytica is estimated to be responsible for an estimated 50 million cases of invasive disease and 70,000 deaths per year (6).
It possesses a divergent core promoter, with nonconsensus TATA and Inr elements, and a third core promoter element "GAAC" unique to E. histolytica (7)(8)(9). Like T. vaginalis, transcription of protein-coding genes in E. histolytica is αamanitin resistant (10). The only protein related to transcriptional regulation that has been identified in this organism is TATA binding protein. However, the ability of this protein, which was cloned by sequence identity, to bind DNA or function in assembly of the basal transcriptional apparatus remains to be demonstrated (11).
The 5' flanking regions of a handful of E. histolytica genes have been studied via truncation and mutational analysis (8,(12)(13)(14). In the hgl2 promoter, a CCAAT-like motif was identified that decreased reporter gene expression when mutated, but the existence of proteins recognizing this element has not been demonstrated (3). In the EhPgp1 and EhPgp2 genes, several metazoan-like upstream regulatory elements were identified by sequence homology (9,21). Some of these elements were able to compete for DNA-binding in electrophoretic mobility assay, but a mutational analysis was not performed. The promoter that has been best characterized, however, is the Gal/GalNAc lectin hgl5 promoter, Identification of two E. histolytica enhancer binding proteins 7

EXPERIMENTAL PROCEDURES
Cultivation and transfection of amebae-. E. histolytica trophozoites of strain HM-1:IMSS were grown at 37°C in TYI medium containing penicillin (100 U/ml) and streptomycin (100µg/ml) (17). Amebae in logarithmic phase growth (approximately 6 x 10 4 trophozoites/ml) were used for transfection experiments and nuclear extract preparation. Stable or transient transfections were performed as described previously except that for transient transfections, cells were lysed after six instead of ten hours (18,19).
Nuclear extract preparation and protein purification-Nuclear extracts were prepared as described previously (20). Approximately 5x10 8 amebae were used as starting material, resulting in 22. 8 mg of protein in 6 ml. The extract was transferred to dialysis tubing, and the volume was reduced to 2 ml by covering the tubing with PEG 8000 for several hours. Concentrated nuclear extract was loaded onto a HiPrep 16/60 Sephacryl S-200 column (Amersham-Pharmacia).
After the void volume was discarded, outflow from the gel filtration column was passed onto a 1 ml HiTrap SP cation exchange column (Amersham-Pharmacia).
After fractions containing proteins with native molecular masses of approximately 100 to 25 kD (as calculated by column calibration using standards of known molecular weight) were allowed to flow into the cation exchange column, the gel filtration column was disconnected. The cation exchange column was then washed with several column volumes of DNA-binding buffer (10 mM Tris-HCl (pH 7. 9), 50 mM NaCl, 1 mM EDTA, 5% glycerol). Bound protein was eluted with an NaCl gradient of 0. 2 M to 0. 6 M NaCl over 20 ml. Fractions (0. 5 ml) were collected, aliquoted, and stored at -70°C for further analysis.
Identification of two E. histolytica enhancer binding proteins 8 Fractions enriched for gel shifting ability were purified further by sequence-specific DNA affinity chromatography. The sequence specific affinity resin was prepared as follows: a DNA construct containing four head-to-tail copies of the URE4 motif served as template for PCR using a primer with a 5' biotin modification (Gibco Life Technologies). The PCR product was incubated for 16 hours at 4°C with a slight excess of magnetic, streptavidin-coated beads in DNA binding buffer plus 2M NaCl (Promega). Binding to beads was monitored by agarose gel electrophoresis. The beads were then washed in DNA-binding buffer and stored at 4°C. Affinity chromatography was performed by incubating a single fraction of nuclear extract purified by gel filtration and cation exchange chromatography (containing 48 µg of protein) with the URE4-complexed beads in DNA-binding buffer. After 1 hour at 4°C, beads were immobilized on a magnetic stand and flow-through material collected. Beads were then washed twice with DNAbinding buffer before elution with this buffer plus 1 M NaCl.
Electrophoretic mobility shift assay (EMSA)-This assay was performed as described previously (20). Briefly, radiolabeled probe was created by annealing complementary oligonucleotides to form double-stranded DNA followed by "filling in" with Klenow and α-32 P-dATP. A typical reaction contained 0. 01 pmoles radiolabeled probe, 0. 2 µg poly(dI . dC), and from 0. 001 to 5 µg protein.  of PCR products was checked by examining the amplified sequence for the next few amino acids predicted by the peptide sequence but not incorporated into the PCR primer. Amino terminal sequence was obtained by designing a new reverse primer based on amplified sequences, which was CTACCTTCTACTTCAAAGTTATCCATTTC for EhEBP1 and CCCTTTGATCTTACATTTCCTCTATG for EhEBP2. The forward primer was complementary to vector flanking sequences upstream from the cDNA inserts.
Northern blot analysis -Total RNA from either lab-passaged or gerbilpassaged amebae was isolated using RNeasy columns (Qiagen). Ten in the same buffer for one hour at room temperature. After three five minute washes, a one hour incubation with Fc-specific, anti-mouse antibody conjugated to horseradish peroxidase (Sigma) was performed (diluted 1:1500), and signal was detected by ECL (Amersham-Pharmacia).
Immunoprecipitation of the Flag-EhEBP1 was performed by incubating a lysate of 2. 0 x 10 6 trophozoites with 1 µg anti-Flag antibody (Sigma) and 30 µl of Protein G beads (Sigma) for 1. 5 hours at 4°C. Beads were washed, boiled in Identification of two E. histolytica enhancer binding proteins 12 sample buffer, and released protein complexes were analyzed by polyacrylamide gel electrophoresis and immunoblot as described above.

Expression of EhEBP1 in E. histolytica-An amino terminal Flag-EhEBP1
fusion protein was engineered by PCR. This fusion was inserted into the BglII and SalI sites of the pGIR209 vector, placing Flag-EhEBP1 expression under tetracycline-inducible control (22,23). Amebae bearing the two expression constructs, Flag-EhEBP1 and TetR, were cultured by selection using G418 and hygromycin. Flag-EhEBP1 expression was induced by the addition of tetracycline (5µg/ml) 16 hours prior to transient transfection with reporter gene constructs. Reporter activity was measured 8 hours after transfection of the reporter (a total of 24 hours after induction of Flag-EhEBP1 expression).
Identification of two E. histolytica enhancer binding proteins

RESULTS
Column chromatography was used to purify two sequence-specific URE4-binding proteins from E. histolytica nuclear extracts.-In a pilot experiment, nuclear extracts were prepared from E. histolytica trophozoites and fractionated by gel filtration. a sample enriched for complex 2-forming activity, was used as input material, enrichment of sequence-specific URE4-binding activity was detected in the elution fraction by electrophoretic mobility shift assay (Fig. 1). This final step resulted in an approximately 800-fold increase in specific activity as compared to the starting material, crude nuclear extract (Table 1). When fractions enriched for complex 1-forming activity, such as fractions 22 and 23 ( Fig. 1), were used in the same affinity purification procedure, however, URE4-binding activity was detected in the input and flow-through fractions but not in the elution fraction (data not shown). Fig.   1 revealed the presence of two dominant bands with molecular masses of 28 and 18 kD in the salt-eluted fraction (E, Fig. 2A). A parallel purification using fraction 27 as input material resulted in a similar increase in sequence-specific URE4-binding activity as measured by electrophoretic mobility shift assay in the elution faction (data not shown), which also contained two dominant proteins with molecular masses of 28 and 18 kD (E', Fig. 2A). The URE4-binding ability of these proteins was tested by southwestern blot: proteins were transferred to a PVDF membrane and allowed to interact with radioactively labeled doublestranded URE4 oligonucleotide (Fig. 2B). This assay confirmed the URE4binding ability of the 28 and 18 kD proteins and also identified a third protein with a molecular mass of 45 kD. The 45 kD protein was not present in sufficient quantities for sequencing and was not further analyzed.

SDS-PAGE analysis of the affinity chromatography fractions shown in
Cloning of EhEBP1 and EhEBP2. -The 28 and 18 kD proteins derived from sequence-specific DNA affinity chromatography, named Entamoeba histolytica enhancer binding protein 1 and 2 (EhEBP1 and EhEBP2), were trypsinized and Identification of two E. histolytica enhancer binding proteins 15 the resulting peptides sequenced by tandem mass spectroscopy. Degenerate oligonucleotides were designed based on the first five or six amino acids of several tryptic peptides and used as PCR primers to amplify two cDNA clones from an E. histolytica cDNA library (Fig. 3). The authenticity of amplified PCR products was determined by identification of amino acids predicted by the peptide sequence but not incorporated into the PCR primer. Furthermore, for both EhEBP1 and EhEBP2, the sequence of every tryptic peptide identified by mass spectroscopy was found in the predicted protein sequence, confirming that the cloned genes corresponded with the sequenced URE4-binding proteins. For EhEBP1, which had an apparent molecular mass of 28 kD as determined by SDS-PAGE ( Fig. 2A), an open reading frame was predicted that corresponded to a protein with an approximate molecular mass of 28 kD. For EhEBP2, which had an apparent molecular mass of 18 kD as determined by SDS-PAGE ( Fig. 2A), an open reading frame was predicted that corresponded to a protein with an approximate molecular mass of 22 kD. This discrepancy may reflect an aberrant SDS-PAGE mobility for EhEBP2, or it may result from a post-translational modification.

EhEBP1 and EhEBP2 contain regions homologous to the RNA recognition motif-
Examination of the amino acid sequences of EhEBP1 and EhEBP2 by searching computer databases did not detect any motifs commonly associated with sequence-specific DNA-binding proteins. Surprisingly, both EhEBP1 and EhEBP2 contained regions of homology to the RNA recognition motif (RRM), a nucleotide binding motif found in a large group of RNA-binding proteins (24,25).
The RRM is also present in several sequence-specific DNA-binding proteins, such as stage specific activator protein (SSAP) (26). EhEBP1 contained two full Identification of two E. histolytica enhancer binding proteins 16 copies of the RRM as well as a 54 amino acid region with many acidic and basic residues (Fig. 4A). EhEBP2 contained one and a half copies of the RRM as well as a 58 amino acid region with many acidic and basic residues. Alignment with representative RNA-and DNA-binding, RRM-containing proteins revealed the presence of many well-conserved amino acids within the RRMs of EhEBP1 and EhEBP2 (Fig. 4B). Sequence homology was also seen between the RRMs of EhEBP1 and EhEBP2 in regions not well conserved between different RRM containing proteins, such as loop 3.

Identification of EhEBP1 and EhEBP2 message in E. histolytica trophozoites-
Northern analysis of RNA derived from E. histolytica trophozoites demonstrated the existence of single transcripts for both EhEBP1 and EhEBP2 (Fig. 5A).
Interestingly, both EhEBP1 and EhEBP2 were approximately two-fold more abundant in animal-passaged versus laboratory-passaged trophozoites after loading correction based on hexokinase message levels (Fig. 5B). Animalpassaged trophozoites have been shown to be more virulent than laboratorypassaged trophozoites, but whether expression of EhEBP1 and EhEBP2 is involved is a topic for further investigation.
Recombinantly expressed EhEBP1 and EhEBP2 bind URE4 in a sequence-specific manner-The sequence-specific URE4-binding activity of recombinant EhEBP1 and EhEBP2 was tested using the electrophoretic mobility shift assay. While purified GST protein alone was unable to bind to the URE4 sequence, a GST-EhEBP1 fusion protein was able to bind URE4 double-stranded oligonucleotide in a sequence-specific manner (Fig. 6A). Binding specificity was demonstrated by competition with excess of URE4 but not unrelated oligonucleotides. EhEBP2 was also able to specifically bind double-stranded URE4 when expressed as a GST fusion protein (Fig. 6B). Binding specificity for this protein was demonstrated by competition with excess of URE4 but not by mutant oligonucleotides. The combination of EhEBP1 and EhEBP2 GST fusion proteins did not result in the appearance of any higher order complexes in EMSA (data not shown). This failure to associate may indicate that EhEBP1 and EhEBP2 do not form heterodimers. Alternatively it may result from steric hindrance by the GST epitope tag or from a lack of postranslational modifications required for dimerization.

Anti-EhEBP1 antibodies recognize a 28 and 18 kD E. histolytica proteins and
inhibit URE4-native protein complex formation in crude nuclear extracts-The GST epitope tag was removed from the GST-EhEBP1 fusion protein by site-specific protease digestion, and the freed EhEBP1 used for mouse immunization. Sera derived from this mouse recognized the GST-EhEBP1 fusion protein but not GST protein alone in an immunoblot assay. Additionally, this sera recognized several proteins in E. histolytica lysates; the molecular masses of the two predominant proteins recognized were 28 and 18 kD (data not shown).

The involvement of proteins recognized by the anti-EhEBP1 antibodies in
forming complexes with URE4 was tested using E. histolytica nuclear extracts.
Preincubation of nuclear extracts with sera from a mouse immunized against EhEBP1 but non nonimmune sera from a mouse of the same strain resulted in an inhibition of URE4-protein complex formation (Fig. 7) This suggests that EhEBP1 is a critical part of the URE4-protein complex formed by native E. induction, the induced protein was detected by anti-Flag immunoprecipitation followed by immunoblot with anti-EhEBP1 antisera (Fig. 8A). The Flag-EhEBP1 migrated in SDS-page at its expected molecular mass of 29.9 kDa and was only present in the induced amebae; the heavy chain of the anti-Flag antibody was visible in both lanes (Fig. 8A). both proteins show sequence-specific binding to both single-stranded and double-stranded DNA. It has been suggested that these AT-rich sites exhibit some single-stranded character in the cell, allowing for interaction between the conserved aromatic residues within the beta sheets of the RRM and nucleotides in the recognition site (26). Interestingly, the URE4 motif recognized by EhEBP1 and EhEBP2 is also AT-rich, which might pose problems for sequence-specific recognition in the context of the AT-rich E histolytica genome. Perhaps the URE4 sequence assumes a partially melted structure, facilitating specific recognition by its RRM-containing binding proteins.
Another interesting aspect of URE4 sequence is that it is composed of two direct nine basepair repeats. Crystallization of the single-stranded DNA-binding protein hnRNP A1, which is involved in regulation of telomere length, revealed that an hnRNP A1 homodimer could recognize two copies of its DNA recognition site simultaneously via its two RRM domains (29). Perhaps another determinant of URE4-binding specificity is interaction between EhEBP1 and EhEBP2 homo-or heterodimers and both nine basepair sequences.
In the RRMs of EhEBP1 and EhEBP2 loop 3 contains many regions of sequence identity not seen in other RRM-containing proteins. This conservation may point to residues important for sequence-specific DNA binding. Swapping of amino acid sequences between U1A and U2B" snRNP-associated proteins identified loop 3 as important for specific binding site recognition (30).
Additionally, loop 3, which is not well conserved among RNA-binding proteins, varying in length and amino acid composition, was found for one RNA-binding protein to have a disordered structure until complexed with RNA, suggesting an Identification of two E. histolytica enhancer binding proteins 21 induced-fit mechanism for nucleotide-protein interaction (31). Future studies of EhEBP1 and EhEBP2 will include truncation and mutational analysis to map the domains involved in sequence-specific recognition of URE4.
When overexpressed as a Flag fusion protein, EhEBP1 resulted in a repression of transcription at the hgl5 promoter. There are several possible explanations for this effect. EhEBP1 may have an ability to activate transcription that is disrupted by fusion of the Flag epitope to its amino terminus.
Alternatively, excess amounts of EhEBP1 could titrate out a cofactor, present in limiting amounts, which is required for transcriptional activation by URE4, either by blocking cofactor access to the DNA or by binding and sequestering the cofactor away from the DNA. If this model were true, co-expression of the missing cofactor would result in transcriptional activation at the hgl5 promoter.
Candidates for this coactivating protein include EhEBP2 as well as the 45 kD protein identified by southwestern blot of purified nuclear extracts (Fig. 2B).
Co-immunoprecipitation with anti-EhEBP1 antibodies could determine whether EhEBP1 exists in a protein complex either with EhEBP2 or other proteins.
Another possibility that would explain the repressing effect of EhEBP1 overexpression is that EhEBP1 is a bona fide inhibitor of transcriptional activation. The ability of URE4 to function as a positive regulatory element may be due to the presence of other URE4-binding proteins that function as transcriptional activators and whose activity dominates over that of EhEBP1 in laboratory-cultured trophozoites. This possibility could be tested by assaying the transcriptional activity of nuclear extracts immunodepleted with anti-EhEBP1 antibodies in an in vitro transcription assay, a technique that has not yet Identification of two E. histolytica enhancer binding proteins 22 been adapted for E. histolytica. The fact that EhEBP1 message is more abundant in the more virulent animal-passaged trophozoites, however, seems to suggest that EhEBP1 has an activating rather than a repressing function.
The identification of two sequence-specific enhancer binding proteins in E.     1) and (2)). Shading highlights conserved residues. Black shading indicating typically invariant residues and gray shading representing residues typically occupied by a conservative grouping of amino acids. The secondary structure is modeled after the crystallized RNA-binding protein U1A snRNP-associated protein and is indicated above the sequence. The starting amino acid for each RRM repeat is indicated to the left of the amino acid sequence.    were transfected with either the wildtype (WT) or mutant (mut) luciferase reporter constructs. Six hours later cells were lysed and luciferase activity measured. As a control, expression of an unrelated protein was induced from the same tet-inducible promoter and its effect on wildtype reporter gene activity measured.