Differential expression of the expression site-associated gene I family in African trypanosomes.

A minimum of 20 different mRNA species encoding related members of the expression site-associated gene I (ESAG-I) family occur in metacyclic variant antigen type 4 bloodstream trypanosomes. None of these ESAG-I mRNAs are derived from the metacyclic variant antigen type 4 variant surface glycoprotein (VSG) gene expression site, and some appear to come from pseudogenes. The ESAG-Is are transcribed in both procyclic and bloodstream trypanosomes, but their mRNAs accumulate to a detectable steady state level only in bloodstream trypanosomes. At least five different groups of 3'-untranslated regions (3'-UTRs) are represented among these ESAG-I mRNAs, suggesting that the 3'-UTR does not contribute to their differential expression. Some ESAG-I mRNAs completely lack a 3'-UTR or have only a single nucleotide as a 3'-UTR. Transcription of the ESAG-Is is sensitive to alpha-amanitin, indicating that they are transcribed by a different RNA polymerase than the VSG genes. These results collectively demonstrate that ESAG-I's are a heterogeneous population that can be expressed independently of VSG genes, but like the VSG genes, their mRNAs are present in the bloodstream stage of the parasite and not in the procyclic stage.

African trypanosomes evade their hosts' immune response by sequentially expressing different variant surface glycoproteins (VSGs) 1 from a repertoire of 1000 or more VSG genes. The 20 or more potential expression sites for a VSG gene are invariably situated near a telomere, whereas the transcriptionally silent VSG genes are scattered throughout the chromosomes. The mechanisms that activate one and only one of these telomere-linked expression sites at a time in a given trypanosome are only partially understood. In some cases, activation is associated either with duplicative transposition of a silent donor VSG gene to a telomeric-linked expression site or with a telomere exchange event. In other cases, a silent VSG gene already at a telomere-linked site is activated in situ without apparent DNA rearrangement (for recent reviews, see Refs. [1][2][3]. Transcription of at least some telomere-linked VSG expression sites is initiated 45-60 kb upstream of the VSG gene and proceeds through as many as nine or 10 members of different gene families called expression site-associated genes (ESAGs). The resultant polycistronic pre-mRNA is processed into individual mRNAs by 5Ј trans-splicing and 3Ј polyadenylation (4 -7). The steady state levels of the ESAG mRNAs are as much as 100 -700-fold less than that of the VSG mRNA, indicating that expression of these co-transcribed genes is regulated at least in part by post-transcriptional events such as pre-mRNA processing and/or mRNA stability (8). The different ESAGs in an expression site are distinguished from one another by numbers or Roman numerals. ESAG-1 (or ESAG-I) is designated as the first gene preceding the VSG gene, and in general the larger the number or numeral, the further upstream within the expression site the ESAG is (the exception being ESAG-8, which lies between ESAG-3 and -4 (3)). Most of the 31-kb sequence of the AnTat 1.3A VSG gene expression site has been reported (3,6,9), and known protein products of its nine ESAG representatives include an adenylate cyclase (ESAG-4), two transferrin receptor subunits (ESAG-6 and -7), and a putative zinc finger protein . The functions of the other ESAG products encoded in this expression site remain to be elucidated.
The ESAG-I family was the first ESAG family to be discovered (8,10). Its 14 -25 members encode amphiphilic glycoproteins of about 46 kDa whose function and cellular location are not known (10,11). We have previously reported the genomic sequence of an ESAG-I that is located several kb upstream of the VSG gene expressed by the MVAT4 trypanosome clone (12). A promoter has been found to occur between the ESAG-I and the VSG gene in this telomere-linked expression site (13). Nuclear run-on assays, primer extension experiments, and reporter gene transfections all indicate that this promoter initiates synthesis of a monocistronic pre-mRNA encoding only one protein, the VSG. No additional open reading frame occurs in this pre-mRNA, suggesting that the upstream ESAG-I must be part of another transcription unit, if indeed it is transcribed at all.
To resolve the question of whether the upstream ESAG-I is expressed in MVAT4 trypanosomes, we isolated two dozen ESAG-I cDNAs from an MVAT4 cDNA library. We discovered that none of these cDNAs were identical to the ESAG-I upstream of the MVAT4 VSG gene. This finding led to the study described here, which demonstrates that many different ESAG-Is are transcribed in bloodstream trypanosomes by an ␣-amanitin-sensitive RNA polymerase and suggests that the term "expression site-associated gene" may be a misnomer for this gene family. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) U40840, U40841, and U41223-U41234 (for ESAG-Ia through ESAG-In cDNAs) and M21052 (for MVAT4 ESAG-I).
Analysis of Nascent (Run-on) RNA in Isolated Nuclei-The nuclei of procyclic or bloodstream trypanosomes were isolated using a protocol kindly provided by Etienne Pays and described previously (13,18,19) and stored at Ϫ70°C until used. They were thawed and incubated with [␣-32 P]UTP, and their RNAs were isolated for use as probes in Southern blots as described (13). In some experiments ␣-amanitin (500 g/ml) was added to the nuclei prior to incubation.
Other Procedures-The bloodstream T. brucei rhodesiense cDNA libraries were constructed in ZAP (Stratagene) as described earlier for the MVAT4 cDNA library (20). The libraries were screened with 32 Plabeled probe A (see Fig. 2) using standard procedures (21) under the following moderately stringent hybridization and washing conditions: 42°C for 16 h with the labeled probe in 50% formamide, 6 ϫ SSC, 0.1% SDS, 5 ϫ Denhardt's solution, and 100 mg/ml denatured salmon sperm DNA followed by a single washing at 45°C for 1 h in 0.2 ϫ SSC and 0.1% SDS. Genomic DNAs (22) and total RNAs (23) were isolated from bloodstream or procyclic trypanosomes for Southern and Northern blots (21). Probes A-F shown in Fig. 2 were generated by either polymerase chain reaction amplification or restriction enzyme digestions. Hybridization probes were labeled with 32 P using the random priming method (24). DNA sequences in plasmids were determined by a combination of manual sequencing (25) using a Sequenase kit (U.S. Biochemical Corp.) and automated sequencing using an ABI 373 automated sequencer (Perkin-Elmer). Sequences were aligned using the HIBIO MacIntosh DNASIS program (Hitachi) and the CLUSTAL algorithm (26).

Comparison of ESAG-I cDNAs-
The telomere-linked expression sites for the genes encoding the MVAT4 VSG and the WRATat1.1/1.19 VSG were characterized in earlier experiments (13,39) and are shown in Fig. 1. MVAT4 bloodstream trypanosomes express a metacyclic VSG gene without apparent DNA rearrangement from a promoter located about 2 kb upstream of the VSG gene's start codon. An ESAG-I situated about 3 kb upstream of this promoter will be referred to as MVAT4 ESAG-I. Nuclear run-on experiments have demonstrated that most, if not all, of the 3 kb intergenic region between MVAT4 ESAG-I and the promoter is not transcribed (13).
WRATat1.1 and WRATat1.19 are separately isolated trypanosome clones expressing the same bloodstream VSG gene, also without apparent DNA rearrangement. WRATat1.19 was originally cloned from a mouse infected by tsetse flies, which had ingested WRATat1.1 trypanosomes. In both WRATat1.1 and WRATat1.19, the expressed VSG gene is preceded by a barren region of 25 kb or more that is comprised predominately of 76-bp repeats. The promoter for this VSG gene has not been identified but nuclear run-on assays using ultraviolet (UV)-irradiated nuclei from WRATat1.1 indicated that it is located far upstream of the gene and perhaps in front of the 76-bp repeats. Thus, this expression site resembles other bloodstream VSG expression sites whose primary transcripts have been shown to be 45-60 kb in length (3)(4)(5)(6). It is not known if an ESAG-I is located in front of this barren region or if one is represented in the very long primary transcript encoding the WRATat1.1/1.19 VSG.
To examine the expression of ESAG-Is in these trypanosomes, the 660-bp probe A indicated in Figs. 1 and 2 was used to screen cDNA libraries constructed from poly(A) ϩ RNA of each of these three trypanosome clones. When 70,000 clones in the MVAT4 cDNA library were screened, 24 clones were identified (0.034%). Since about 4% of the cDNAs in the same library encode the MVAT4 VSG, the ratio of ESAG-I cDNAs to MVAT4 VSG cDNAs in this library is 0.034:4 or about 1:120. Likewise, the WRATat1.1 and WRATat1.19 cDNA libraries were found to contain a similar ratio of ESAG-I cDNAs to VSG cDNAs.
The 24 ESAG-I cDNAs from a MVAT4 cDNA library and four additional ESAG-I cDNAs (two each from WRATat1.1 and WRATat1.19 cDNA libraries) were chosen for further study. Partial sequence determinations of the 24 ESAG-I cDNAs from the MVAT4 cDNA library revealed that 20 possessed unique coding sequences, none of which was identical to MVAT4 ESAG-I. Thus, a minimum of 20 different ESAG-I mRNA species occur in MVAT4 trypanosomes, and it seems likely that additional unique ESAG-I cDNAs could be found if the library were rescreened and more positive clones were sequenced. The complete sequences of 11 of the 24 cDNAs were determined (Fig. 2). The complete sequences of the remaining 13 cDNAs were not determined because their partial sequences indicated that they were very similar to at least one of the other sequences. As it turned out, 2 of the 11 cDNAs were found to be identical (collectively called ESAG-Ik in Fig. 2), and another two differ only in the lengths of their 3Ј-UTRs (ESAG-Ic and ESAG-Ij in Fig. 2). In addition, the complete sequences of the four ESAG-I cDNAs from the other two libraries were determined and were also found to have nucleotide differences. These 14 distinct ESAG-I cDNAs are compared schematically in Fig. 2 along with the MVAT4 ESAG-I coding sequence. The 14 cDNAs are called ESAG-Ia to -In, with ESAG-Ia being chosen as the reference sequence for the sake of comparison because it is the longest sequence determined.
As is readily apparent from Fig. 2, some ESAG-I cDNAs are more similar to each other than are others. For example, ESAG-Ia to -Ij have very similar coding regions, but only the coding regions of ESAG-Ic and ESAG-Ij are identical. Among these 10 ESAG-Is, the first four share long 3Ј-UTRs (Ͼ1 kb) that are very similar, the fifth has a related intermediately sized 3Ј-UTR, and the next three (ESAG-If, -Ig, and -Ih) share an unrelated short 3Ј-UTR (207 bp) whose divergence extends into the last 16 codons of the coding region. The last two cDNAs among this common group of 10 have either a very short 3Ј-UTR of 15 bp (ESAG-Ii) or completely lack a 3Ј-UTR (ESAG-Ij). In the latter case, the last residue of the termination codon, TGA, is the first residue of the poly(A) tail. Five of these 10 common cDNAs have an interior termination codon (indicated by the black dot in Fig. 2  ESAG-Ia) that disrupts the open reading frame, suggesting they are derived from pseudogenes. At least one other ESAG-I sequence with an internal termination codon has been reported (27). ESAG-Ik appears to be a hybrid ESAG; the 5Ј-half resembles the above group of 10, whereas the 3Ј-half, including the 3Ј-UTR, is divergent. Another unexpected feature of this group of 10 ESAG-I cDNAs is that some of those whose 5Ј-ends extend to the 5Ј-spliced leader do not possess the usual ATG start codon. Instead, they have the codon ATA at this position (small black rectangles in Fig. 2). Although it is not known if the corresponding AUA in the RNA can serve to initiate protein synthesis in trypanosomes, this triplet can function as a start codon in prokaryotes and mitochondria (28 -30). Whether it can serve as a start codon in eukaryotes probably depends on the flanking nucleotides (31) and remains to be demonstrated in trypanosomes. In contrast, ESAG-Il, ESAG-Im, and the MVAT4 ESAG-I have the conventional ATG start codon.
The last three cDNAs shown in Fig. 2 (ESAG-Il, -Im, and -In) differ more substantially in sequence, both among each other and from the other ESAG-Is, than do the first 11. These differences are represented by the differently patterned segments. ESAG-Il has a 3Ј-UTR consisting of a single cytosine between the termination codon TGA and the poly(A) tail. MVAT4 ESAG-I, the last sequence represented in Fig. 2, is also quite divergent from any of the above 14 cDNAs. Fig. 3 displays the deduced amino acid sequences for the eight most divergent ESAG-I coding regions, again with ESAG-Ia as the reference sequence. Three general features are apparent from this amino acid alignment. First, the N-terminal halves of the ESAG-I coding sequences are clearly more similar than are the C-terminal halves, an observation made previ-ously for a smaller number of ESAG-I sequences (8,11,12). The only exception to this rule is ESAG-Id (second line in Fig. 3), which displays more divergence from ESAG-Ia in the front half than in the back half. Thus, from a functional standpoint, the more highly conserved N-terminal half appears to tolerate fewer changes than the C-terminal half. The second apparent feature is that individual ESAG-Is often differ from each other in small blocks of amino acids. Correspondingly, two or more ESAG-Is may share a block of 2-10 amino acids that the other ESAG-Is do not have. This result suggests that ESAG-I family members may undergo occasional internal cross-over events among themselves, diversifying their sequences. An example of this possibility is ESAG-Ik, whose N-terminal half closely resembles the common group of 10 and whose C-terminal half is clearly derived from a different sequence that nevertheless has within it some conserved blocks of amino acids. Another illustration comes from a comparison of ESAG-Im and ESAG-In, two of the more highly divergent ESAG-Is whose amino acid sequences are much more similar in their N-terminal halves than in their C-terminal halves. Finally, six of the eight cysteines in ESAG-Ia are conserved in all of the ESAG-Is, and some are conserved within highly divergent regions, suggesting that these cysteines may play an important structural or functional role.
Northern and Southern Blot Analyses- Fig. 4A shows Northern blots of RNAs from procyclic trypanosomes (lane P) and from bloodstream trypanosome clones MVAT4 and MVAT7 (lanes 4 and 7). The blots are probed with three probes indicated in Fig. 2 and a tubulin probe. Probe B is a representative of the coding sequence shared by the common group of 10 cDNAs described above. Probe C represents the long 3Ј-UTR possessed by a subset of this common group, and probe D is the divergent 3Ј-half of ESAG-Ik.
Probe B hybridizes to two RNA size classes, one of 2.3 kb and one of 1.3 kb. If the poly(A) tail accounts for about 0.2 kb, the two classes correspond in size to the ESAG-I cDNAs in the common group of 10 that have either the long 3Ј-UTR (ESAG-Ia to -Id) or the unrelated short 3Ј-UTR (ESAG-If to -Ih). In some lanes an intermediately sized band can be seen that likely corresponds to ESAG-Ie, which has an intermediately sized 3Ј-UTR. No hybridization to procyclic RNA was detected, even under very long exposure times. Thus, the steady state level of ESAG-I RNA is much higher in bloodstream parasites than in procyclic organisms.
Probe C does not hybridize to the 1.3-kb RNA, supporting the above interpretation that this RNA size class corresponds to ESAG-I cDNAs lacking this long 3Ј-UTR. Probe C does hybridize to the 2.3-kb RNA, as expected, and also hybridizes to a larger RNA species not recognized by probe B. The significance of this larger RNA is uncertain. Perhaps all or part of the probe C sequence occurs in an RNA species that does not have an ESAG-I coding region or has an ESAG-I coding region that has diverged enough to not cross-hybridize with probe B under the moderately stringent conditions used. Since a representative of this larger RNA species was not among the cDNAs examined, its presence was not studied further. Probe D recognizes an RNA slightly smaller than 1.3 kb, as expected from the size of its cDNA (Fig. 2). The tubulin probe (which does not have small amounts of vector sequence that hybridize to the marker DNAs as do the other probes) generates signals of similar intensities in the other three lanes, showing that similar amounts of RNA were loaded in each lane. Fig. 4B shows that similar hybridization patterns are obtained when RNAs from the bloodstream WRATat1.1 and WRATat1.19 trypanosome clones are probed with fragments B and C. Thus, multiple ESAG-I mRNA species are present in these trypanosome clones as well, consistent with the finding that the four ESAG-I cDNAs examined from the WRATat1.1 and WRATat1.19 cDNA libraries are all different. Since the expression site for the 1.1/1.19 VSG is transcribed into a long polycistronic pre-mRNA, the presence of multiple, heterogeneous ESAG-I transcripts is not an aberration of active expres-  sion sites transcribed into a monocistronic VSG pre-mRNAs.
When probe A (the MVAT4 ESAG-I) was used in Northern blots under the same hybridization stringency conditions, no detectable signal was observed to either bloodstream or procyclic RNAs, even after long exposure times (not shown). This result is consistent with the fact that none of the 24 ESAG-I cDNAs isolated from the MVAT4 cDNA library encode the MVAT4 ESAG-I. Thus, if MVAT4 ESAG-I RNA is present in MVAT4 trypanosomes, it is at a level too low to detect on Northern blots and at best represents only a few percent of the total ESAG-I RNA population in MVAT4 organisms. Fig. 5 shows Southern blots of genomic DNAs from MVAT4, MVAT7 and WRATat1.1 trypanosomes hybridized to probes B, C, and F. EcoRI and HindIII were used to digest the DNAs because none of the ESAG-I cDNA sequences possess their cleavage sites. The hybridization patterns were identical for all three genomes with all of the probes tested, providing no evidence for DNA rearrangements among the ESAG-Is. Probe B (coding region of the common 10) recognizes at least eight HindIII fragments, one of which is much more intense than the others, suggesting that either multiple copies of this particular fragment exist or multiple genes occur within the fragment. Probe C (long 3Ј-UTR) hybridizes to a subset of the EcoRI and HindIII fragments to which probe B binds, as expected if this 3Ј-UTR is encoded by only some of the genes recognized by probe B. Probe F, which should recognize another subset of these same genes, generates a simple banding pattern but appears to detect an EcoRI and HindIII fragment not recognized by probe B. The interpretation of this result is not clear, but it suggests that at least one copy of the probe F sequence in the genome might be flanked by a region other than the probe B sequence.
Analysis of Nascent ESAG-I Transcripts in Nuclei of Procyclic and Bloodstream Trypanosomes-Nuclear run-on assays were used to detect the nascent transcripts of the ESAG-Is. Radiolabeled run-on RNAs from nuclei of procyclic and bloodstream trypanosomes incubated in the presence or absence of ␣-amanitin were used to probe the ESAG-I coding region. Transcription from the VSG and PARP gene expression sites is known to be resistant to ␣-amanitin, suggesting that these expression sites are transcribed by RNA polymerase I or a modified RNA polymerase II (32,33). The procyclic portion of Fig. 6 (left panel) shows the results when procyclic run-on RNA was used as the probe. As expected, procyclic RNA synthesized in the absence of ␣-amanitin (panel C) hybridizes to the PARP and tubulin genes but not to the VSG gene. It also hybridizes with a reduced signal to probe B (the coding region of the common ten) but not to probe A (MVAT4 ESAG-I coding region). This result indicates that at least some of the ESAG-Is are transcribed in procyclic organisms even though procyclic ESAG-I RNA was not detected on Northern blots. When ␣-amanitin was present during the procyclic nuclei incubation (panel B), transcription of the PARP genes was unaffected but transcription of the genes for tubulin and ESAG-I was greatly reduced, indicating that the ESAG-Is are transcribed by a conventional RNA polymerase II, similar to the tubulin genes but distinct from the PARP and VSG genes. The bloodstream portion of Fig. 6 (right panel) shows a similar experiment using run-on RNA from bloodstream MVAT4 nuclei. In the absence of ␣-amanitin this RNA hybridizes strongly to the MVAT4 VSG and weakly to probe B. Thus, the single MVAT4 VSG gene is transcribed at a much higher rate than are the multiple ESAG-Is, consistent with the 120:1 ratio of their respective cDNAs in the MVAT4 cDNA library. Again, no hybridization to probe A was detected, consistent with the finding that transcripts of MVAT4 ESAG-I are only a small fraction of the pool of heterogeneous ESAG-I RNAs in MVAT4 trypanosomes, if indeed they even exist at all. The ␣-amanitin had no effect on VSG gene transcription, as expected, but diminished transcription of the ESAG-Is to an undetectable level, indicating that the MVAT4 VSG gene and the ESAG-Is are transcribed by different RNA polymerase complexes. The run-on RNAs were also used to probe other unique regions within the collection of ESAG-I cDNAs (not shown). Although the strongest signals were obtained to the probe B sequence, consistent with its presence in more than half of the ESAG-I cDNAs, weak ␣-amanitin-sensitive signals were also obtained to probes C and D and to fragments unique to ESAG-Il, -Im, and -In. DISCUSSION Recent reviews on the molecular mechanisms of trypanosome antigenic variation, including one from this lab (1), invariably refer to ESAG-Is as genes that are co-transcribed with the VSG gene in the active VSG gene expression site. In MVAT4 trypanosomes this clearly is not the situation. No evidence was obtained for the expression of MVAT4 ESAG-I in MVAT4 organisms. None of the ESAG-I cDNAs had the same sequence as MVAT4 ESAG-I (Figs. 1 and 2), nuclear run-on RNA from MVAT4 nuclei did not hybridize to MVAT4 ESAG-I (Fig. 6), and Northern blots probed with the MVAT4 ESAG-I probe (probe A) did not detect a transcript. Thus, if MVAT4 ESAG-I is expressed in MVAT4 organisms, its mRNA is a very small fraction of the total pool of heterogeneous ESAG-I mRNAs and too rare to detect by conventional techniques.
A possible explanation for this unexpected finding is that the monocistronic MVAT4 expression site is not representative of the polycistronic expression sites identified for at least some other bloodstream VSG genes (3)(4)(5)(6). However, the expression site for the WRATat1.1/1.19 VSG gene does resemble these other polycistronic VSG expression sites, and WRATat1.1/1.19 trypanosomes likewise contain a heterogeneous population of ESAG-I mRNAs. Indeed, Northern blots (Fig. 4) suggest that four different trypanosome clones (MVAT4, MVAT7, WRATat1.1, and WRATat1.19) express essentially the same array of different ESAG-Is. Yet, in those cases where an ESAG-I does occur within the polycistronic transcription unit of a VSG gene expression site, its RNA sequence is the predominant ESAG-I mRNA in that trypanosome clone (8). The simplest explanation of these results is that an expressed ESAG-I need not be in a telomere-linked VSG gene expression site, but it does no harm if it is there. Perhaps ESAG-Is become part of a telomere-linked, polycistronic expression site only if they inadvertently land there as a partner in a recombination event such as a transposon-mediated transposition, a VSG gene conversion, or some other DNA rearrangement event. Our preliminary characterizations of genomic DNA clones containing ESAG-Is suggest that they are scattered about the genome, sometimes in clusters of at least four gene copies, and Southern blots of CHEF gels probed with probes B and C indicate that they occur predominately on large chromosomes of 2,000 kb or more (not shown). In addition, Southern blots of restricted genomic DNA (Fig. 5) provide no evidence that they are near large barren regions such as the 76-bp repeats or telomeric repeats. Thus, although some members of the ESAG-I family are located in a VSG expression site, others are not and manage to be transcribed from these other sites.
The nuclear run-on assays (Fig. 6) indicated that the ESAG-Is are transcribed in both procyclic and bloodstream trypanosomes by an ␣-amanitin-sensitive RNA polymerase, in contrast to VSG genes whose RNA synthesis is resistant to ␣-amanitin. Yet Northern blots showed that the ESAG-I mRNAs accumulate to a detectable steady state level only in bloodstream trypanosomes, similar to the VSG mRNAs. These results are consistent with the earlier findings of Graham and Barry (34), except they found that synthesis of ESAG-I transcripts in procyclic organisms was resistant to ␣-amanitin rather than sensitive. Although the reason for this difference is unclear, the distribution of ESAG-Is within the genome of the T. brucei 221 clone used in their work is undoubtedly different from that in our trypanosome clones, so at least some ESAG-Is of the two respective serodemes could be on different transcription units in procyclic organisms.
In the case of VSG mRNAs, their semiconserved 3Ј-UTRs are crucial in conferring the bloodstream stage specificity to the VSG mRNAs (35). However, at least five different general groupings of 3Ј-UTR sequences exist among the heterogeneous ESAG-I cDNAs described here (represented in ESAG-Ia, -If, -Ik, -Im, and -In). Our attempts to identify substantive sequence similarities among these five main 3Ј-UTR groupings via either pairwise alignments or a group alignment of all five were not particularly revealing. The longest stretch of sequence identity among all five is five bp. The maximum sequence identity between any two of the five groups is 17 of 23 positions.
Still other poly(A) ϩ ESAG-I cDNAs were found to have 3Ј-UTRs of only 15, 1, or 0 nucleotides, suggesting but not proving that the 3Ј-UTR is not responsible for the bloodstream stagespecific stability of the ESAG-I mRNAs as it is for VSG mRNAs. One possibility for the existence of these unique ESAG-I 3Ј-UTR sequences is that they might confer different properties to the ESAG-I RNAs within bloodstream trypanosomes, such as different half-lives, specific cytoplasmic compartmentalization, or differential expression in stumpy and slender forms. Another possibility is that their bloodstream stage-specificity could be conferred, not by the 3Ј-UTRs but by sequences within a conserved segment of the more highly conserved N-terminal coding regions. It should be possible to test these and other models for the bloodstream stage specificity of ESAG-I mRNAs using transient transfections with a reporter gene containing different segments of the ESAG-I coding regions and 3Ј-UTRs.
The small blocks of identity and/or dissimilarity among some of the deduced ESAG-I amino acid sequences (Fig. 3) are reminiscent of similar properties of a few duplicated VSG genes (36 -38). In those VSG examples, the newly duplicated VSG gene is a mosaic of segments from two or more closely related donor VSG isogenes and is likely created by multiple cross-over events among the isogenes during duplication. Sometimes, these donor VSG genes are actually pseudogenes with internal termination codons that prohibit their own expression into protein but do not interfere with the conversion of other segments of their sequence into a new gene. Similar events could also scramble blocks of sequences among the multiple ESAG-Is and pseudogenes, leading to the patterns revealed in Figs. 2 and 3.
The function of the ESAG-I proteins remains an enigma. Their sequences and the rarity of their mRNAs suggest that they are minor surface proteins (3,8), although attempts to verify their surface location using antibodies have been precluded by their low abundance (10). The results described here do not shed new light on their possible functions but do demonstrate that their presence is not linked to a specific VSG or VSG gene expression site. In addition, the complete sequence conservation of some segments of their N-terminal halves and the complete conservation in all ESAG-Is of six of the eight cysteines in ESAG-Ia (see Fig. 3) suggest that, despite their heterogeneity and low abundance, ESAG-I proteins do serve a role for bloodstream trypanosomes. This is a role that apparently can be fulfilled by heterogeneous mixtures of related proteins rather than a homogeneous protein population such as is required for VSG function. The challenge now is to identify that ESAG-I role.