A Candidate U1 Small Nuclear RNA for Trypanosomatid Protozoa*

In trypanosomatid protozoa, all mRNAs obtain identical 5′-ends by trans-splicing of the 5′-terminal 39 nucleotides of a small spliced leader RNA to appropriate acceptor sites in pre-mRNA. Although this process involves spliceosomal small nuclear (sn) RNAs, it is thought that trypanosomatids do not contain a homolog of the cis-spliceosomal U1 snRNA. We show here that a trypanosomatid protozoon, Crithidia fasciculata, contains a novel small RNA that displays several features characteristic of a U1 snRNA, including (i) a methylguanosine cap and additional 5′-terminal modifications, (ii) a potential binding site for common core proteins that are present in other trans-spliceosomal ribonucleoproteins, (iii) a U1-like 5′-terminal sequence, and (iv) a U1-like stem/loop I structure. Because trypanosomatid pre-mRNAs do not appear to contain cis-spliced introns, we argue that this previously unrecognized RNA species is a good candidate to be atrans-spliceosomal U1 snRNA.

In trypanosomatid protozoa, all mRNAs obtain identical 5-ends by trans-splicing of the 5-terminal 39 nucleotides of a small spliced leader RNA to appropriate acceptor sites in pre-mRNA. Although this process involves spliceosomal small nuclear (sn) RNAs, it is thought that trypanosomatids do not contain a homolog of the cis-spliceosomal U1 snRNA. We show here that a trypanosomatid protozoon, Crithidia fasciculata, contains a novel small RNA that displays several features characteristic of a U1 snRNA, including (i) a methylguanosine cap and additional 5-terminal modifications, (ii) a potential binding site for common core proteins that are present in other trans-spliceosomal ribonucleoproteins, (iii) a U1-like 5-terminal sequence, and (iv) a U1-like stem/loop I structure. Because trypanosomatid pre-mRNAs do not appear to contain cis-spliced introns, we argue that this previously unrecognized RNA species is a good candidate to be a trans-spliceosomal U1 snRNA.
In trypanosomatid protozoa, all mRNAs have identical 5Јterminal sequences. A 39-nucleotide (nt) 1 spliced leader (SL) sequence is transferred from the 5Ј-end of a small SL RNA to the pre-mRNA in a process known as trans-splicing, which is very similar to the spliceosomal cis-splicing found in other eukaryotes (1). However, it is thought that trypanosomes and their relatives contain neither cis-spliced introns (1,2) nor an equivalent of the U1 small nuclear (sn) RNA required for spliceosomal cis-splicing (3). The SL RNA is considered to be a trans-spliceosome-specific snRNA because it is found in a ribonucleoprotein particle that contains the same core proteins that are associated with other trans-spliceosomal snRNAs (4,5). It has been proposed that during trans-splicing, sequences within the SL RNA are able to substitute for the function that U1 snRNA normally supplies in cis-splicing (3,6,7). Our dis-covery of a U1 snRNA homolog in Euglena gracilis (8), an organism that is specifically related to trypanosomatid protozoa (9), prompted us to re-evaluate a possible role for U1 snRNA in trypanosomatids.
In cis-splicing, during early stages of spliceosome assembly, the 5Ј-terminal region of U1 snRNA base pairs across the 5Ј-splice site (10). In the case of trans-splicing, it is commonly held that base pairing between U1 snRNA and the 5Ј-splice site may not be required for splicing of SL RNA sequences (1,3,6,7). This view is supported by a study in which a trypanosomatid (Leptomonas collosoma) SL RNA sequence was placed upstream of a 3Ј-splice site, with the resulting chimeric substrate being efficiently spliced in a HeLa cell nuclear extract even after the 5Ј-end of Ͼ99% of the endogenous U1 snRNA had been removed by oligonucleotide-directed RNase H cleavage (7). This result provided support for an earlier proposal that U1 snRNA-like base pairing could be supplied by a region of the SL RNA upstream of the 5Ј-splice site (6). However, it was subsequently shown that U1 snRNP is, in fact, required for cissplicing of the chimeric substrate, and that base pairing between the 5Ј-end of U1 snRNA and the SL RNA 5Ј-splice site does occur in these extracts when the 5Ј-end of U1 snRNA is intact (11). It has also been demonstrated recently that the proposed internal SL RNA base pairing across the 5Ј-splice site is not essential for trans-splicing in Leishmania tarentolae (12). In sum, the proposal that trypanosomatid SL RNA substitutes for U1 snRNA in trans-splicing has not gained experimental support, and the data do not definitively rule out the possibility that a U1 snRNA homolog is present in trypanosomatid protozoa but has not yet been identified. Here we show that a representative trypanosomatid, Crithidia fasciculata, contains a novel small RNA that displays several characteristic features expected of a U1 snRNA homolog.
Sequence Analysis-5Ј-End-labeled PCR and RT-PCR products, as well as an RT product generated using 5Ј-end-labeled primer CfU1A, * This work was supported by Grant MT-11212 (to M. W. G.) from the Medical Research Council of Canada and by a fellowship (also to M. W. G.) from the Canadian Institute for Advanced Research (Program in Evolutionary Biology). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF157481.

RESULTS AND DISCUSSION
Antibodies specific for m 3 2,2,7 G have been used to enrich for capped snRNAs in Trypanosoma brucei (3). Although no U1 snRNA homolog was found among the four largest m 3 2,2,7 Gcapped RNAs, the experiments did detect other, smaller capped RNAs that were not further characterized (3). Additional sn-RNAs have also been detected by immunoprecipitation using antibodies directed against core proteins common to T. brucei spliceosomal snRNPs (5). Fig. 1 shows an electrophoretic profile of C. fasciculata RNAs that were immunoprecipitated using a monoclonal anti-m 3 2,2,7 G antibody. The antibody reacted efficiently with homologs of the m 3 2,2,7 G-capped RNAs identified previously in T. brucei. It also reacted efficiently with the 7-monomethylguanosine (m 7 G)-capped SL RNA (20) and with an intermediate of the trans-splicing reaction, the 39-nt free SL RNA exon. In addition, the antibody enriched for a subset of tRNAs that appear to have internal m 7 G in their variable loops, as judged by chemical reactivity during sequencing.
When the amount of antibody relative to RNA was reduced in immunoprecipitation experiments, the RNA yield decreased but the intensity of individual bands relative to each other remained unchanged (data not shown). Although the monoclonal antibody did not distinguish between m 7 G and m 3 2,2,7 G caps, this procedure did allow us to identify a previously unrecognized methylguanosine (mG)-capped RNA that was present at approximately the same concentration as the known trans-spliceosomal snRNAs (U1 snRNA? in Fig. 1).
When the preliminary sequence of this candidate U1 snRNA was used as a query in a GenBank TM search, we detected a similar sequence in a tRNA gene cluster from L. tarentolae (23), located on the opposite strand between upstream tRNA Arg and downstream tRNA Leu genes. PCR experiments (primer combination CfU1A and 3ЈARG, see "Experimental Procedures") confirmed that in C. fasciculata there is also a tRNA Arg gene on the opposite strand ϳ100 base pairs upstream of the candidate U1 snRNA sequence. Attempts to amplify the region in C. fasciculata DNA between this new sequence and a possible downstream tRNA Leu gene (primer combination CfU1 and 5ЈLEU, see "Experimental Procedures") were unsuccessful, indicating that this particular gene linkage may not be conserved between L. tarentolae and C. fasciculata.
The first two residues of the novel RNA were identified as A by DNA sequencing ( Fig. 2A) but were resistant to cleavage by RNases and alkali during enzymatic sequencing (not shown), indicating that they are O 2Ј -methylated. The first residue was also resistant to cleavage in the A reaction during chemical sequencing (Fig. 2B), suggesting that it contains additional modifications. No other post-transcriptional modifications were encountered during sequencing, but we did detect a single site of heterogeneity, A/C at position 60 (Fig. 2, C and D).
In order to further characterize the modified A at the 5Ј-end of the molecule, the RNA was 5Ј-end-labeled after removal of the mG cap structure by tobacco acid pyrophosphatase treat-2 D. F. Spencer, unpublished protocol. ment. Gel-purified RNA was digested with snake venom phosphodiesterase, and the resulting radioactive mononucleotide was analyzed by thin layer chromatography. This experiment (Fig. 3, A and B) demonstrates that the candidate U1 snRNA has the same hypermodified 5Ј-terminal nucleoside, (N 6 ,N 6 ,O 2Ј -trimethyladenosine (m 2 6 Am)) that is known to be present at the 5Ј-end of SL RNAs from C. fasciculata and T. brucei (20); in contrast, the U2 (Fig. 3A) and U4 (not shown) snRNAs have 5Ј-terminal O 2Ј -methyladenosine (Am) residues. The similarity in methylation patterns observed between this new RNA and the SL RNA may be explained by the fact that six of the seven 5Ј-terminal nucleotides are identical in the two RNA species (Fig. 3C).
Trypanosomatid snRNAs contain sequences that bind spliceosomal core proteins (4 -6, 24, 25). The C. fasciculata SL RNA core-protein binding site, which is capable of interacting with mammalian Sm proteins (6) as well as with T. brucei core proteins (4), appears to consist of an AAAUUUUGA sequence followed by a short G ϩ C-rich hairpin. The novel RNA reported here has an almost identical sequence, AUAUUUUGA (residues 44 -52), followed by a 3Ј-terminal hairpin structure that is supported by compensating base changes in the L. tarentolae sequence (Fig. 4B).
The presence of a core-protein binding site and a modified 5Ј terminus, both of which are specifically related to their transspliceosomal SL RNA counterparts, provides strong evidence that this new capped RNA is also a component of the transspliceosome. Given that U1 snRNA is the only typical spliceosomal snRNA that has not yet been discovered in trypanosomatid protozoa (3,24,25), it is not surprising that we also found primary sequence conservation between this C. fasciculata candidate U1 snRNA and known U1 snRNAs from other organisms ( Fig. 4C; compare Fig. 4, A and B). Notably, the highly conserved 5Ј-terminal region contains the ACCU se-  (27,28) showing the locations of modified nucleosides (Um ϭ O 2Ј -methyluridine, ϭ pseudouridine) and the Sm core-protein binding site. B, potential secondary structure of the C. fasciculata RNA showing differences in its homolog from L. tarentolae (circled residues next to the C. fasciculata sequence). A trypanosomatid-specific spliceosomal coreprotein (CP) binding site, analogous to the Sm site in higher eukaryotes, is indicated (Sm/CP site). C, 5Ј-terminal sections of the C. fasciculata (C.f.) nucleotide sequence are compared with the corresponding sections of U1 snRNA sequences (8,29) from Euglena gracilis (E.g.), Physarum polycephalum (P.p.), Saccharomyces cerevisiae (S.c.), and H. sapiens (H.s.). Dots represent residues that are identical to the C. fasciculata sequence. The loop region of stem/loop I is enclosed by parentheses. Secondary structure diagrams were generated using the program XRNA developed by B. Weiser and H. Noller (University of California, Santa Cruz, CA). quence (residues 6 -9, overlined in Fig. 4) that has the potential to interact with conserved splice site sequences (10). The C. fasciculata candidate U1 snRNA also contains a stem/loop I sequence that, in mammalian systems, is the binding site for U1-70K, a U1 snRNP-specific protein (30).
In view of the unusually small size of the C. fasciculata candidate U1 snRNA (69 nt), it is also not surprising that it lacks some of the characteristic features of a U1 snRNA (Fig. 4,  A versus B). The highly conserved stem/loop II, which serves as the binding site for the U1-A protein in other eukaryotes (30), is the most notable structural element that is missing. One could argue that stem/loop II functions are either unnecessary for trans-splicing or are supplied by another component of the spliceosome (RNA and/or protein). Other trypanosomatid sn-RNAs show a similar pattern of reduced size and sequence divergence compared with their homologs from other systems (1,3,19,24,25).
The presence of structurally divergent snRNAs in trypanosomatid protozoa may indicate that the highly accurate splice site selection required for cis-splicing systems may not be as important in a "trans-splicing only" system (1). For example, the demands on 5Ј-splice site selection machinery (including U1 snRNA) are considerably reduced in trans-splicing, where the 5Ј-splice site is already present in the spliceosome as part of the spliceosomal SL RNA. Similarly, reduced accuracy of branch point and 3Ј-splice site selection (1) in SL RNA transsplicing (which occurs upstream of coding sequences) is likely to be tolerated, because errors would not disrupt the reading frame in the spliced product (1). On the other hand, with the discovery of an apparent U1 snRNA homolog, trypanosomatid protozoa may possess a complete set of spliceosomal snRNAs, raising the possibility that the mechanisms of trans-and cissplicing are not as different as was previously thought. In this regard, in the absence of a complete genomic sequence, we cannot rule out the possibility that a least a few cis-spliced introns may eventually be found in the trypanosomatid group of protozoa.