The Stage-regulated Expression of Leishmania mexicana CPB Cysteine Proteases Is Mediated by an Intercistronic Sequence Element*

The tandemly arranged CPB genes of Leishmania mexicana are polycistronically transcribed and encode cysteine proteases that are differentially stage-specific; CPB1 and CPB2 are expressed predominantly in metacyclics, whereas CPB3–CPB18 are expressed mainly in amastigotes. The mechanisms responsible for this differential expression have been studied via gene analysis and re-integration of individual CPB genes, and variants thereof, into a CPB -deficient parasite mutant. Comparison of the nucleotide sequences of the repeat units of CPB1 and CPB2 with CPB2.8 (typical of CPB3–CPB18 ) revealed two major regions of divergence as follows: one of 258 base pairs (bp) corresponding to the C-terminal extension of CPB2.8; another, designated InS, of 120 bp, with insertions totaling 57 bp, localized to the intercistronic region downstream of CPB1 and CPB2 . Cell lines expressing CPB2.8 or CPB2 with the 3 (cid:1) -untranslated region and intercistronic sequence of CPB2.8 showed up-regulation in amastigotes. Conversely, metacyclic-spe-cific expression occurred with CPB2 or CPB2.8 with the 3 (cid:1) -untranslated region and intercistronic sequence

Protozoan parasites of the genus Leishmania are diploid eukaryotes that cause a range of cutaneous and visceral diseases that afflict ϳ12 million people worldwide (www.who.int/ health-topics/leishmaniasis.htm). Leishmania species have a digenetic life cycle, passing between a sandfly vector and mammalian hosts. Leishmania exist as extracellular flagellated promastigotes within the alimentary canal of the insect, and these differentiate into the highly infectious metacyclic form that is found within the mouth parts of the insect. Parasites are transmitted to a mammal when the vector takes a blood meal, and following macrophage invasion they reside within phagolysosomes as aflagellated amastigotes (1). A stringently coordinated pattern of gene expression is crucial to life cycle progression.
The haploid Leishmania genome consists of 36 chromosomes with sizes that range between 0.3 and 2.5 megabases (2)(3)(4). Sequencing of chromosome 1 of Leishmania major revealed that it has an unusual distribution of predicted protein-coding genes. Remarkably, the first 29 genes are all encoded on one strand, whereas the remaining 50 genes are encoded in a headto-head manner on the opposite strand (5). Several highly expressed protein-coding genes of Leishmania, such as those encoding the surface glycoproteins 63 (6) and 46 (7), ␣and ␤-tubulin (8), A2 (9), and the cysteine proteases CPB (10), occur in tandemly arranged clusters of identical or very similar genes that can show stage-specific expression (10 -13). To date, there is no evidence for the existence of promoter elements for an ␣-amanitin-sensitive RNA polymerase II (5). Thus transcription of protein coding genes is considered to be constitutive in Leishmania and related trypanosomatids, and the regulation of gene expression is believed to occur predominantly at the posttranscriptional level (14,15).
The polycistronic precursor mRNAs of trypanosomatids are converted into monocistronic mRNAs by two cleavage reactions that occur within the intercistronic regions. One cleavage is associated with trans-splicing of a capped 39-nucleotide spliced leader sequence to the 5Ј-end of the mature mRNA (16). The second cleavage occurs in a region several hundred bases upstream of the splice acceptor site and is necessary for polyadenylation (17). The trans-splicing and polyadenylation reactions are tightly coupled and regulated by polypyrimidine tracts present within the intercistronic sequences (17). Numerous studies have shown that differential gene expression in Leishmania can be mediated by sequence elements in the 3Ј-UTRs 1 that affect mRNA stability (13, 18 -21). Intercistronic regions have also been implicated in controlling gene expression (22), probably by mediating the events involved in pre-mRNA processing (trans-splicing and cleavage/polyadenylation), but no cis-regulatory elements have been defined (22). The precise mechanisms that govern leishmanial gene expression are still not well understood, although the recent identification and characterization of the L. major poly(A)-binding protein I (23) should shed some light on this area.
Leishmania mexicana contain numerous lysosomal cysteine proteases, the majority of which are cathepsin L-like and are expressed in increasing amounts during life cycle progression from promastigotes to amastigotes (24,25). The type I cysteine proteases are encoded by the CPB genes (26), which map to one genomic locus as a tandem array of 19 copies (10). Targeted deletion of this array to generate the null mutant ⌬CPB has shown that the genes encode virulence factors (27,28). Reexpression of different CPB genes in the ⌬CPB cell line and analyses of enzyme substrate preferences have shown that some CPBs have differing substrate specificities (10). Moreover, the first two genes of the array, CPB1 and CPB2, are atypical because they encode enzymes that lack the C-terminal domain characteristic of trypanosomatid type I cysteine proteases (10,29,30). Furthermore, CPB1 and CPB2 are expressed almost exclusively in the infective metacyclic stage, whereas the remaining isogenes are expressed predominantly in amastigotes (10).
The present study was undertaken to investigate the mechanism controlling this differential stage regulation of CPBs. The strategy utilized was to restore by genetic manipulation CPB repeat units, and variants thereof, into the endogenous CPB-depleted locus of ⌬CPB. Importantly, this allowed gene expression to be analyzed in its correct chromosomal context. The resulting stage-regulated CPB activities of these mutants demonstrated that the differential gene expression profile of the CPB array is dependent upon the presence or absence of short sequence elements in the respective intercistronic regions and that these sequence elements influence processing of precursor mRNA. This mechanism of stage-regulated control differs significantly from previous findings that stage-regulated expression in Leishmania is regulated post-transcriptionally by mRNA stability and translation.
Isolation of CPB2 and Constructs for Transfection-A 2.8-kb SalI fragment from 13 containing CPB2 (10) was subcloned into the SalI site of pBluescript SK ϩ to generate pGL26. This fragment was sequenced (Applied Biosystems, Inc. 373 DNA Sequencer) on both strands by primer walking, and the data were analyzed using the Wisconsin GCG package. For episomal re-expression, CPB2 was excised from pGL26 as a 1951-bp EcoRV fragment and subcloned into the SmaI site of pX (33) to generate pGL67. The parental plasmid used to generate the constructs for targeting to the CPB-depleted locus of ⌬CPB was plmcpb2-hyg (27). A number of subcloning steps were performed.
(i) Digestion of plmcpb2-hyg with SpeI/BamHI released the HYG gene that was replaced with SAT by SpeI/BamHI digestion of plmcpasat (27) to give pGL51.
(ii) The 635-bp HindIII-SalI fragment containing the 5Ј-flank of CPB was excised from plasmid pGL147 and used to replace the 436-bp HindIII-SalI unique 5Ј-flanking region of CPB present in pGL51. The resulting plasmid, pGL146, contains the original 436-bp unique 5Јflank of the CPB array and an additional 199 bp of non-unique 5Ј-flank that includes the sequences required for spliced leader addition.
(iii) To complete the upstream control sequence (from the SalI site to the ATG start codon of the CPBs), a PCR product was generated with primers lmcpb5Ј5Ј (27) and OL711 (5Ј-CTAGTCGCGGACGCGGGCAGC-GAGG-3Ј) using pGL147 as the template. Parameters for thermocycling with Taq polymerase (Applied Biosystems, Inc.) were as follows: 94°C for 5 min; 94°C for 30 s, 60°C for 30 s, and 72°C for 1 min (20 times); 72°C for 5 min. The resulting 803-bp PCR product was cloned into pGEM T-vector (Promega) to generate plasmid pGL157.
(iv) The 168-bp SalI-SpeI fragment from pGL157 (which contains the remainder of the upstream sequence to the start codon that is required for expression of CPBs) was subcloned into pGL146 to generate pGL159. Plasmid pGL159 was subsequently used to integrate the native and chimeric CPBs by cloning into the SalI site.
(v) The chimeric plasmids were first generated in pBS (Stratagene) prior to subcloning into pGL159. The 3684-bp EcoRV fragment from pGL28 (defined as plasmid lmcpbg2.8 in (27)) was used to replace the corresponding EcoRV fragment in pGL26 to generate chimeric CPB2 (pGL153). The 3741-bp EcoRV fragment from pGL26 was used to replace the corresponding fragment in pGL28 to generate chimeric CPB2.8 (pGL154).
(vi) The 2731-bp SalI fragment from pGL26 was subcloned into pGL159 to give the CPB2 re-integration plasmid pGL165.
(viii) The SalI fragment of pGL153 was subcloned into the SalI site of pGL159 to generate the chimeric CPB2 re-integration plasmid pGL167.
(ix) The SalI fragment of pGL154 was subcloned into the SalI site of pGL159 to generate the chimeric CPB2.8 re-integration plasmid pGL168.
The bacterial chloramphenicol acetyltransferase (CAT) gene was amplified from pHD52 (34) using primers OL221 (CTCGAGATG-GAGAAAAAAATCACTGGATATACC) and OL222 (GATATCTTACGC-CCCGCCCTGCCACTCATC) with Pfu Turbo (Stratagene) (20 cycles; annealing 60°C for 30 s and extension at 72°C for 40 s). The 725-bp PCR product was cloned into pPCR-Script Amp(SKϩ) (Stratagene) to generate pGL294. The CPB2 EcoRV-SalI fragment from the non-coding region of pGL26 was subcloned into the corresponding sites of pGL294 to generate pGL298. The CPB 5Ј-sequence that includes the transsplicing site was subcloned into pGL298 as a SacI-SacII fragment derived from pGL157 (a pGEM T-vector clone of the CPB 5Ј-flank region) to generate pGL299. The CPB 5Ј-flank CAT-CPB2 non-coding region was excised from pGL299 as a SalI fragment and subcloned into the corresponding site of pGL159 to generate the construct used for re-integration (pGL300) into the ⌬CPB cell line.
Further details on the sequence of the plasmids can be obtained from the authors upon request.
Transfection-Transfection of L. mexicana was as described previously (27). Briefly, 4 ϫ 10 7 late-log phase ⌬CPB promastigotes were subjected to electroporation with either 2 g of pXCPB2 (pGL67) or 10 g of the purified re-integration cassette obtained by restriction digestion (HindIII and BglII) from the appropriate plasmids (pGL165-pGL168 and pGL300). After overnight incubation in medium lacking antibiotics, the cell line transfected with the episome pGL67 was selected in 25 g ml Ϫ1 G418 (Geneticin, Life Technologies, Inc.). Cell lines transfected for integration were spread onto modified Eagle's medium plates supplemented with 10 g ml Ϫ1 phleomycin (Cayla, France) and 25 g ml Ϫ1 nourseothricin hydrosulfate (Hans-Knöll Institute, Germany). Colonies, visible following 10 -14 days incubation at 25°C, were picked and inoculated into complete liquid modified Eagle's medium containing appropriate antibiotics.
Southern Blot Analysis of Transfectants-Parasites that were found to be resistant to 10 g ml Ϫ1 phleomycin and 25 g ml Ϫ1 nourseothricin hydrosulfate, but sensitive to 50 g ml Ϫ1 hygromycin B, were grown to stationary phase, and DNA was prepared according to Medina-Acosta and Cross (35). The DNA (5 g) was digested with restriction enzymes and processed as described previously (27). The probe consisted of a 1:1 ratio of a 203-bp KpnI-SacI fragment derived from pGL26 and pGL28 that composed part of the ORFs of CPB2 and CPB2. 8.
Gelatin Gel Analysis-Parasite cysteine protease activity was determined using substrate SDS-PAGE as described previously (10). Briefly, parasite cell lysates (10 7 cells) were subjected to electrophoresis under reducing conditions using 12% (w/v) acrylamide gels incorporating 0.2% (w/v) gelatin. Following electrophoresis, the gel was washed for 1 h with 2.5% (v/v) Triton X-100 to allow refolding of proteins and then incubated in 0.1 M sodium acetate, pH 5.5, 1 mM dithiothreitol. Hydrolysis of gelatin was detected by staining with 0.25% (w/v) Coomassie Blue R-250.
CAT Assay-The CAT assay was performed using the CAT assay system (Promega). Metacyclics and axenic amastigotes were lysed at 10 9 cells ml Ϫ1 and 50 l assayed at 37°C for 3 h as described by the manufacturer. Reactions were terminated by addition of 300 l of mixed xylenes (Sigma), and the organic phase, containing n-butyryl-[ 14 C]chloramphenicol, was subjected to liquid scintillation counting.

RESULTS
Sequence Comparison of the Repeat Units of L. mexicana CPB1, CPB2, and CPB2.8 -The 19 CPB genes of L. mexicana are arranged in a tandem array of 2.8-kb repeat units (Fig. 1A). In previous work (10,27) we reported the sequence of CPB2.8 (a randomly cloned gene from the middle of the array) and CPB1 open reading frames, and we showed that expression of CPB1 and CPB2 is metacyclic specific, whereas CPB2.8 is predominantly expressed in amastigotes. In expectation that sequence elements regulating the stage-specific expression of the CPB genes would reside in the 3Ј-UTR or intercistronic regions of the genes, the complete repeat unit (defined as a 2.8-kb SalI fragment, Fig. 1) was sequenced for CPB1 and CPB2.8 (additional sequence deposited as updates to EMBL data base accession numbers Z49962 and Z49963). As both CPB1 and CPB2 were found to be metacyclic specific, the SalI repeat unit containing the CPB2 gene was subcloned from a bacteriophage clone (13, see "Experimental Procedures") (10) and sequenced for comparative purposes.
A high level of nucleic acid sequence identity exists in the SalI repeat units of CPB1, CPB2, and CPB2.8, with the exception of two regions (Fig. 1B, the CPB1 sequence is identical to CPB2 over these two regions). First, there is significant sequence divergence in the region comprising the CTE of CPB2.8 ( Fig. 1C and Ref. 10). Second, a region of 120 bp (termed the insertion sequence (InS), defined in Fig. 1D) is characterized by 4 short insertions, totaling 57 bp, in the CPB2 and CPB1 sequences. The final insertion of 32 bp is composed predominantly of GT and GC repeats (12 repeats in total). Additionally, CPB1 and CPB2 have a 31-bp region within the diverged sequence that is 90% AT-rich and includes 14 TA repeats. In contrast, comparison of CPB1 and CPB2 sequences from the stop codon to the downstream SalI site (a region comprising 1490 bp) revealed only 4 nucleotide changes (not shown). In addition, in the 1183 bp of sequence between the stop codon of CPB2.8 and the downstream SalI site (excluding the 120 bp InS), only 7 nucleotides differed from CPB1 and 3 differed from CPB2.
CPB2 Encodes a Major, High Mobility Gelatinase Activity Present in L. mexicana Metacyclics-cpb2 is predicted to encode a protein, CPB2, of 359 amino acids with a high degree of sequence identity to CPB1. In the pre-pro region, CPB2 shows absolute identity to that of CPB2.8, whereas there are 4 amino acid changes relative to the pre-pro-region of CPB1 (Table I).
The primary amino acid sequence of the mature domain of CPB2 is the most divergent of the isoenzymes of the CPB array examined so far and encodes four amino acids that are unique to date: Glu 21 , Ser 30 , Ser 115 , and Glu 160 . Nevertheless, the mature domain of CPB2 is 95% identical to that of CPB2.8 and 98% identical to that of CPB1. The CPB2 coding region, like CPB1 (10), has a single base pair deletion that results in a frameshift and premature termination relative to the CTE of CPB2.8 (Fig. 1C). Thus, both of the metacyclic-specific genes encode CPB enzymes with truncated CTEs, whereas those genes expressed at a high level in amastigotes (CPB2.8, CPB cDNA (26) and CPB18) encode enzymes with a full-length C-terminal extension.
Populations of L. mexicana stationary phase promastigotes that contain a high proportion of metacyclics express two predominant cysteine proteases that show activity toward gelatin in a non-denaturing activity gel (36) (Fig. 2, lane 1). When CPB1 was expressed in the L. mexicana CPB null mutant (⌬CPB), the activity of the encoded protein was found to correspond to the slower mobility gelatinase activity present in wild type metacyclic extracts (Fig. 2, lane 2). To investigate if CPB2 encoded the faster mobility activity in wild type metacyclic promastigotes, CPB2 was transfected into ⌬CPB on an episomal shuttle vector. The cysteine protease activity of the resultant mutant was analyzed toward gelatin. CPB2 was found to encode an active protease (Fig. 2, lane 3) that corresponded in size to the faster mobility activity detected in wild type metacyclic promastigote cell extracts (Fig. 2, lane 1). Notably, fewer lower mobility cysteine protease activities, labeled activated precursors in the Fig. 2, were observed with CPB1 or CPB2 (10,27). The equivalent lower mobility activity bands associated with CPB2.8 (ϳ34 kDa) have been confirmed recently by protein sequence analysis of a recombinant CPB2.8 as being activated zymogen (37). This differences between the wild type parasites and the mutants re-expressing CPB1 and CPB2 are likely to reflect the different number of isogenes being expressed. Alternatively, they could relate to the presence or absence of the C-terminal extension domain in different CPB isoenzymes.
Generation of L. mexicana Mutants with CPB Genes Reintegrated into the Endogenous Locus-We postulated that the InS identified from comparison of the non-coding sequences of CPB2 and CPB2.8 contained the control elements necessary to modulate the expression of the different CPB genes. Thus, the presence of the InS would result in metacyclic-specific expression, as was shown previously by Northern blotting with a probe to detect CPB1 and CPB2 transcripts (10). Conversely, amastigote-specific expression is predicted in the absence of the InS. This was also suggested previously by Northern blot-ting with a CTE-specific probe (10). To test this hypothesis, we utilized an L. mexicana cell line that has all CPB genes deleted (⌬CPB) (27) to generate a series of mutants in which single CPB genes were restored to the CPB locus. The following four constructs used to target the CPB locus encoded: (i) the full CPB2 repeat unit, including the CPB2 ORF, the natural 3Ј-UTR, and downstream intercistronic region (giving the Transfection of ⌬CPB with these constructs followed by double antibiotic selection with phleomycin and nourseothricin resulted in the replacement of the HYG gene with SAT. Confirmation that the CPB constructs had been correctly targeted to the natural CPB locus of ⌬CPB was obtained by Southern blotting (Fig. 3B). The probe, derived from the ORF of CPB2 and CPB2.8, detected an EcoRV fragment of 5.0 kb in all four cell lines containing re-integrated copies of CPBs (lanes 1-4). confirmed by the absence of hybridization to ⌬CPB DNA restricted with EcoRV (lane 5). The sizes of all fragments detected are as predicted from the map of the CPB locus (Fig. 1A).
The InS Regulates Expression of L. mexicana CPBs-The expression of CPB in the four cell lines was analyzed by Northern blotting to determine levels of steady-state mRNA and by gelatin SDS-PAGE to determine the proteolytic activity (Fig.  4). Two life cycle stages of the parasite were analyzed, stationary phase promastigotes that contain a high proportion of the metacyclics and amastigotes grown in axenic culture. RNA samples prepared from each cell line were separated by gel electrophoresis, blotted, and probed with a radiolabeled CPBspecific probe. After scanning on a PhosphorImager, the intensity of hybridization was plotted relative to ⌬CPB::CPB2 for the metacyclic form and relative to ⌬CPB::CPB2.8 for the amastigote form (Fig. 4A). Similar levels of expression were found for CPB2 and CPB2.8 in the metacyclic form (lanes 1 and 2, respectively). In the amastigote form, CPB2.8 was expressed at Ͼ3-fold higher levels than the metacyclics (lane 6), whereas CPB2 expression was below the limit of detection (lane 5). These data correspond well with the CPB expression profile found in wild type L. mexicana (10) and provide evidence that re-integration of the genes into the endogenous CPB locus results in a similar level of stage-regulated expression as in wild type cells. The size of the mature mRNA in the mutant lines was the same as wild type (data not shown) providing further evidence for correct processing of precursor RNA.
In The stage-regulated CPB activity of the mutant cell lines was analyzed by gelatin SDS-PAGE (Fig. 4B). The ⌬CPB::CPB2 cell line was found to have substantial gelatinase activity in the metacyclic form (lane 1) but little activity in the amastigote form (lane 5), confirming the expression profile shown for mRNA levels by Northern blotting (Fig. 4A). Likewise, the ⌬CPB::CPB2.8 cell line had CPB2.8 activity in the amastigote form (lane 6), but little was detected in the metacyclic form (lane 2). This stage-regulated difference parallels the data from the Northern blots that show that expression of CPB2.8 is up-regulated in the amastigote form TABLE I Amino acid differences between the coding sequences of CPB1, CPB2, and CPB2.8 The N terminus of the mature CPB is designated as residue 1, and the pre-pro region is assigned negative numbers decreasing towards the N terminus (10). Shading shows identities. These data therefore are consistent with the hypothesis that stage regulation of CPBs is mediated via sequence element(s) present in the downstream regions of the respective genes. More precisely, the stage regulation correlates with the presence or absence of the InS located within the EcoRV-SalI non-coding region of CPB2. The presence of these short insertions results in down-regulation of gene expression and hence CPB activity in the amastigote form (lanes 1 and 4), whereas amastigote-specific expression and CPB activity occur in the absence of the insertions (lanes 6 and 7). Of note is the finding that expression of CPB2.8 resulted in significant levels of zymogen (appearing as a slower mobility activity band of Ͼ30-kDa apparent molecular mass in the gels due to in situ activation) in both the metacyclic (lane 4) and amastigote (lane 6) forms, whereas less precursor was apparent with metacyclic or amastigote forms expressing CPB2 (lanes 1 and 7). This provides further evidence that the processing of the zymogen in vivo differs between isoenzymes with and without a C-terminal extension.

The InS Is Located in the Intercistronic Region between Poly(A) Addition Sites and the Site of Trans-splicing-
The above data confirm that CPB1 and CPB2 are expressed in the metacyclic form, whereas CPB2.8 is up-regulated in the amastigote form and that the InS is likely to be of importance in controlling the differential stage-regulated expression. To determine whether the InS is located in the 3Ј-UTR of CPB2 or in the intercistronic region between the polyadenylation site and the site of addition of the spliced leader, the poly(A) addition sites were determined for CPB in wild type L. mexicana metacyclics by RT-PCR. Fig. 5 shows the localization of poly(A) addition sites using a primer combination to detect CPB2 and CPB1 mRNAs, a primer combination to detect CPB3-CPB18 (including CPB2.8) mRNAs, and a generic primer combination that detects all CPB mRNAs (see "Experimental Procedures"). A number of poly(A) addition sites were utilized, and these mapped to two regions, designated S1 and S2. Region S1 was between 825 and 810 bp 5Ј of the AG dinucleotide previously identified as the splice acceptor site (26). The second region, S2, was located between 520 and 486 bp 5Ј of the splice acceptor site. Further experiments using the generic primer combination showed that both sites were also utilized in axenic amastigotes. With the exception of one PCR product, generated with a 5Ј-oligonucleotide (DB1) present in both CPB1 and CPB2, showing a poly(A) addition site to exist within the InS, all other identified sites were upstream of the InS.
To confirm that the same polyadenylation sites were used in the mutant cell lines, 3Ј-RACE was performed on mRNA isolated from metacyclic and axenic amastigote forms of ⌬CPB::CPB2, ⌬CPB::CPB2.8, ⌬CPB::CPB2 A , and ⌬CPB::CPB2.8 M using the generic primer combination. Two major sites of polyadenylation, correlating with the data obtained from wild type metacyclics and axenic amastigotes, were found in all four cell lines. This shows that correct polyadenylation is taking place in the mutant cell lines and that the stage-regulated expression mimics the wild type state.
InS Directs Stage-regulated Expression of the CAT Reporter Gene-The 3Ј-UTR and downstream intercistronic region of CPB2.8 have been shown recently to direct expression of reporter genes in L. mexicana amastigotes (38). To test the ability of the InS to direct stage-regulated expression of a reporter gene, the EcoRV-SalI non-coding region downstream of CPB2 was fused to the CAT gene and the cassette targeted to the CPB locus of ⌬CPB. Clones resistant to 10 g ml Ϫ1 phleomycin and 25 g ml Ϫ1 nourseothricin hydrosulfate, and sensitive to 50 g ml Ϫ1 hygromycin B, were isolated. Southern blotting confirmed the HYG cassette in ⌬CPB had been replaced by homologous recombination with the CAT cassette to generate ⌬CPB::CAT (data not shown). Assays performed with 5 ϫ 10 7 cell equivalents showed that CAT activity was down-regulated by ϳ20fold in axenic amastigotes relative to stationary phase (metacyclic) promastigotes. This provides evidence that the InS element can down-regulate in amastigotes of Leishmania not only CPB genes but also a heterologous gene. DISCUSSION This study addressed the mechanism mediating differential stage-regulated expression of the L. mexicana CPB genes. We have identified a 120-nucleotide sequence (designated the InS) containing 4 insertions totaling 57 nucleotides that is downstream of the metacyclic-specific genes CPB1 and CPB2 and absent from the downstream region of the amastigote-specific gene CPB2. 8. By analyzing expression of CPB genes reintegrated into their native genomic environment, we have shown that the InS element can modulate expression of cysteine protease genes in vivo in a stage-specific manner. The InS is also capable of modulating the expression of a heterologous gene, as demonstrated using CAT that was expressed some 20-fold less in amastigotes than in metacyclics of ⌬CPB::CAT mutants. This degree of down-regulation is consistent with the 30-fold reduction in CPB2.8 mRNA levels observed between ⌬CPB::CPB2.8 and ⌬CPB::CPB2.8 M amastigotes (Fig. 4A). Given the apparent lack of promoters within the intercistronic region of transcription units transcribed by ␣-amanitin-sensitive RNA polymerase II in Leishmania and other trypanosomatids (5,14,15), it is reasonable to propose that the CPB genes are transcribed polycistronically and that differential stage-regulated gene expression is controlled post-transcriptionally. For many genes of Leishmania (13, 17, 18, 22, 39 -41) and trypanosomes (42)(43)(44)(45) it is thought that mRNA stability is the major mechanism of post-transcriptional control. By demonstrating that the InS is located in the intercistronic region between the CPB genes and is therefore present in the polycistronic pre-mRNA but not in the mature mRNA, we have provided evidence that pre-mRNA processing is a major mechanism for control of stage-regulated CPB gene expression. Polycistronic pre-mRNA processing events in trypanosomatids include the tightly coupled processes of trans-splicing and polyadenylation (17, 46 -48). Thus, we propose a model whereby a trans-acting RNA-binding factor (or factors) associates with sequences in the InS and interacts with the trans-splicing and polyadenylation machinery to modulate polycistronic pre-mRNA processing. This modulation leads to increased expression of CPB1 and CPB2 in metacyclics but conversely downregulation of expression of CPB1 and CPB2 in amastigotes.
It is interesting to note that the AT-rich nature of the sequence within the InS resembles motifs known to bind trans-acting factors involved in 3Ј-end processing of pre-mRNA, although the spatial context with respect to the poly(A) addition sites is altered in Leishmania. For example, the AAUAAA motif binds the mul-tisubunit cleavage and polyadenylation specificity factor that is required for cleavage and polyadenylation of mammalian pre-mRNA (49,50). There is also an absolute requirement for this motif in the nematode Caenorhabditis elegans, an organism that like Leishmania has a significant number of genes within operons and a requirement for trans-splicing (51,52). Another major component of mammalian pre-mRNA 3Ј-end processing machinery is the cleavage-stimulatory factor, which binds GUrich elements downstream of poly(A) addition sites (49,50). Interestingly a series of 12 GT repeats are found in insertion IV of the InS. As pre-mRNA, GU repeats have the potential to form stable hairpin-loop structures that could interact with regulatory factors involved in 3Ј-end processing and are themselves developmentally regulated (53). Furthermore, a conserved U-rich element localized in the intercistronic region of several C. elegans and Caenorhabditis briggsae operons has been recently identified (54). Mutational analysis has shown the U-rich element is essential for pre-mRNA processing of genes within a C. elegans operon (54). A U-rich RNA-binding protein, TcUBP-1, has also recently been identified in Trypansoma cruzi. It binds a 44-nucleotide AU-rich RNA instability element, located in the 3Ј-UTR of mucin SMUG mRNAs (55). TcUBP-1 is stage-regulated and is thought to control the stability of mRNAs containing the AU-rich instability sequence. As a family of genes encoding similar U-richbinding proteins has been identified in the Leishmania genome (55), it is possible that a member of this RNA-binding protein family associates with the InS and regulates stage-specific expression of CPB1 and CPB2 in L. mexicana.
3Ј-RACE analysis of transcripts from wild type and mutant L. mexicana metacyclic promastigotes indicates that a number of poly(A) sites were used within two closely defined regions (designated S1 and S2, Fig. 5). The lack of an identifiable recognition sequence for polyadenylation in Leishmania has been reported previously for several genes (see Ref. 15 for 5. Localization of poly(A) addition sites relative to InS. A, comparison of the CPB2 and CPB2. 8 repeat units. The annotation shows the relative positions of the 5Ј-oligonucleotides used for RT-PCR analysis and the position of polyadenylation sites S1 and S2. Other labeling as in Fig. 1B. B and C, sequences of the S1 and S2 poly(A) addition sites. Nucleotides in bold represent poly(A) addition sites determined by RT-PCR. Arrows above the poly(A) addition sites indicate the origin of the clones sequenced and the primer combinations used for RT-PCR; , DB1 or DB3; , OL699; 2, the poly(A) addition site of the amastigote expressed CPB cDNA (26). The InS sequence is double underlined. details). All the poly(A) sites identified were located Ͼ400 bp upstream of the trans-splicing site and were therefore consistent with the trypanosomatid model of the poly(A) polymerase scanning the pre-mRNA for suitable addition sites (17). Most important for this study, the sites of polyadenylation in the mutant cell lines where the CPB genes have been reintroduced into the CPB locus were very similar to those of wild type parasites. Indeed it is a notable aspect of this investigation that the experimental strategy allowed gene regulation to be studied in its correct chromosomal context, rather than from episomal expression vectors as used in previous studies on the expression of other Leishmania genes (13, 17, 18, 22, 39 -41, 56). Integration allowed the study of clones and therefore precluded the problems associated with variations in copy number and rates of transcription that occur with episomal elements. The validity of the approach was confirmed by an analysis of CPB2, which in wild type L. mexicana is metacyclic-specific, and CPB2.8, which is up-regulated in the amastigote form (10,27). Integration of the complete repeat units for CPB2 and CPB2.8 resulted in the expected stage-regulated expression (Fig. 4). The resulting mutants showed stage-regulated CPB activity that is dependent upon the context of the non-coding sequences downstream of the CPB ORFs.
A recent study has shown that sequences present in the 3Ј-UTR of some amastigote-specific genes stabilize the mRNA leading to higher levels of expression. 2 Comparative sequence analysis of the 3Ј-UTR of genes expressed at elevated levels in amastigotes revealed a 150-nucleotide region that is capable of inducing reporter gene expression specifically in amastigotes. This sequence element, however, does not appear to be present in the 3Ј-UTR of the L. mexicana CPB cDNA. 2 The results of this study, coupled with previous work that showed that the 3Ј-UTR and downstream intercistronic sequence of CPB2.8 can be used to generate high level expression of a reporter gene in amastigotes from constructs targeted to the rRNA locus (38), indicate that CPB2.8 mRNA also contains amastigote-specific stabilization sequences.
So far the complete CPB locus has not been characterized in detail in any other species of Leishmania, so it is not known if the InS is present downstream of the first two genes in the corresponding CPB loci. Also, we have been unable to identify an InS sequence downstream of any other multicopy genes of Leishmania. However, the small size and the repetitive nature of these insertions make other functional elements difficult to identify in DNA sequence data bases. It is interesting to note that the complete tandem gene array that encodes the Trypanosoma brucei cysteine proteases homologous to CPB of Leishmania has been sequenced recently as part of the African trypanosome genome project (GenBank TM accession number AC073906). There are no insertions in the intercistronic sequences downstream of the first two copies of the array. Thus the tight expression of individual genes within a tandem array, which is a feature of the CPBs of L. mexicana, appears to be absent from African trypanosomes such as T. brucei. Unlike the L. mexicana CPBs, which vary significantly in sequence between different encoding genes of the array, especially in the C-terminal extension, the 11 T. brucei cysteine proteases encoded on the tandem array are very similar (only 5 of 450 amino acids have variation between some of the 11 isoenzymes). Thus the enzymes of T. brucei may all be functionally very similar such that different expression profiles would not be advantageous.
⌬CPB has reduced infectivity to macrophages and animals compared with the wild type parasites, implicating the CPB enzymes as virulence factors (27). The mutants in which individual CPB genes have been chromosomally re-integrated are powerful tools to study the roles of individual CPBs in parasite virulence. To this end, detailed phenotype/immunological studies are underway to shed light on why L. mexicana requires such an exquisitely stage-regulated array of CPB genes.