The trans-Spliceosomal U4 RNA from the Monogenetic Trypanosomatid Leptomonas collosoma

U4 small nuclear RNA is essential fortrans-splicing. Here we report the cloning of U4 snRNA gene from Leptomonas collosoma and analysis of elements controlling its expression. The trypanosome U4 RNA is the smallest known, it carries an Sm-like site, and has the potential for extensive intermolecular base pairing with the U6 RNA. Sequence analysis of the U4 locus indicates the presence of a tRNA-like element 86 base pairs upstream of the gene that is divergently transcribed to yield a stable small tRNA-like RNA. Two additional tRNA genes, tRNAPro and tRNAGly, were found upstream of this element. By stable expression of a tagged U4 RNA, we demonstrate that the tRNA-like gene, but not the upstream tRNA genes, is essential for U4 expression and that the B box but not the A Box of the tRNA-like gene is crucial for expression in vivo. Mapping the 2′-O-methyl groups on U4 and U6 small nuclear RNAs suggests the presence of modifications in canonical positions. However, the number of modified nucleotides is fewer than in mammalian homologues. The U4 genomic organization including both tRNA-like and tRNA genes may represent a relic whereby trypanosomatids “hired” tRNA genes to provide extragenic promoter elements. The close proximity of tRNA genes to the tRNA-like molecule in the U4 locus further suggests that the tRNA-like gene may have evolved from a tRNA member of this cluster.

In trypanosomes all pre-mRNAs are produced by trans-splicing. In this process a common short spliced leader (SL) 1 derived from a small RNA, the SL RNA, is added to each mRNA. trans-splicing is mechanistically related to cis-splicing (1).
As in cis-splicing, U snRNAs are required. Trypanosome counterparts to U2, U4, and U6 have been characterized and shown to function in trans-splicing (2,3). Trypanosome snR-NAs are generally smaller than their cis-splicing counterparts. Only recently was the trypanosome U5 homologue identified and its gene cloned and sequenced (4,5). Evidence supports the presence of a trypanosome tri-snRNP complex carrying the U4/U6⅐U5 snRNAs (5). However, the trypanosome U5 snRNA may have a unique role in trans-splicing based on its potential to interact with the SL RNA intron region by base pairing, as supported by in vivo cross-linking experiments (4) and phylogenetic conservation (5).
The majority of trypanosome snRNAs carry a divergent Sm site. The Sm site was shown in other eukaryotes to bind core proteins that are common to all snRNPs and are recognized by sera from autoimmunue patients (6). The trypanosome snRNPs are the only ones so far that are not recognized by anti-Sm sera that recognize Sm proteins from yeast to man. Surprisingly, however, trypanosomes do possess Sm proteins, because recently an SmE homologue was identified among the core proteins that bind the SL RNA (7).
In mammals and yeast, U snRNA genes, except U6, are transcribed by RNA polymerase II (8). The promoters of these genes include distal and proximal elements and TATA boxes. In trypanosomes, in contrast, all small RNAs including U2, U3, U6, and 7SL RNA are transcribed by polymerase III (9). In the last three cases, A and B box elements located upstream of the gene were shown to be essential for transcription in vivo. These control elements are part of the internal control regions of divergently oriented tRNA genes or in the case of the U2 snRNA, of a tRNA-like gene (10).
To study further the assembly of trypanosome U snRNAs in vivo, we have cloned and sequenced the U4 gene from the monogenetic trypanosomatid Leptomonas collosoma. This completes the set of spliceosomal RNAs from this trypanosomatid species. The U4 gene locus represents a new mode of genomic organization, harboring both tRNA-like and tRNA genes upstream of the coding region. Like U2 snRNA, but unlike U6 snRNA, the B box sequence of the tRNA-like molecule plays a major role in the expression of the gene. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s)AF204671.
Cloning of the Gene Encoding the U4 RNA-L. collosoma DNA (10 g) was digested to completion with Sau3AI, and the digest was electrophoresed on 1% agarose gel. Fragments between 0.5 and 1.2 kilobases were excised and electroeluted. The eluted DNA was extracted with phenolchloroform and precipitated in ethanol. The concentrated DNA (0.25 g) was ligated in 500 l of buffer containing 50 mM Tris-HCl, pH 7.4, 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, 10 mg/ml gelatin, and 1 unit/ml ligase (New England Biolabs) at a DNA concentration of 0.5 g/ml. The ligation mixture was further extracted with phenolchloroform, precipitated in ethanol, and used as inverse PCR template. The PCR was performed with the primers, 424 and 425, at 94°C for 1 min, 50°C for 1 min, 72°C for 2 min for 30 cycles. The PCR product (600 bp) was gel-purified and then cloned in pCR TM II cloning vector (Invitrogen). The clone termed as U4 -1 was sequenced by SP6 and T7 primers and was found containing the U4 coding region and only 157 bp upstream sequence. To obtain the full sequence of the upstream regulatory region, a EMBL3 library was screened with an RNA probe synthesized with T7 polymerase (Promega) using the U4 -1 plasmid as a template. Three positive plaques were obtained and digested with AluI. A fragment of 1.2 kilobase was cloned into pBluescript and sequenced with T3, T7, and the internal primer 25001.
Plasmid Construction-Tagging of the U4 gene and site-directed mutagenesis were generated by PCR using primers carrying the mutation and oligonucleotides from the 5Ј-and 3Ј-ends of the gene (indicated above). The mutations were confirmed by sequencing. The mutated genes were cloned to the pX vector (11). L. collosoma cells were transformed with the constructs, and cell lines were selected in the presence of G418 as described previously (11).
RNA Isolation and Analysis-Total RNA was prepared with TRIzol (Life Technologies, Inc.). The RNA samples were fractionated on a 10% polyacrylamide-7 M urea gel and electroblotted onto a Nytran membrane (Hybond, Amersham Pharmacia Biotech). Hybridization with labeled oligonucleotides was performed at 42°C in 5ϫ SSC, 0.1% SDS, 5ϫ Denhardt's solution, and 100 g/ml salmon sperm DNA. Primer extension was performed using end-labeled oligonucleotide (100,000 cpm/pmol). Reactions were performed as described previously (12) and were analyzed on a 6% polyacrylamide-7 M urea gel next to DNA sequencing reaction performed with the same primer. Primer extension reaction for mapping of 2Ј-O-methylations was performed at 0.02 and 5 mM concentrations of dNTPs using 5Ј-end-labeled oligonucleotides complementary to the 3Ј-end of U4, U5, and U6 snRNAs (13,14).

RESULTS AND DISCUSSION
The U4 Gene Is Linked to tRNA-like and Two tRNA Genes-By inverse PCR using primers derived from the Trypanosoma brucei U4 snRNA sequence (2), a Sau3AI 600-bp genomic fragment was cloned and sequenced. An RNA probe synthesized from this clone was used to screen a genomic library. A 1.2-kilobase AluI fragment carrying the gene was subcloned and sequenced. The sequence contains the U4 coding region 432 bp upstream and 253 bp downstream of sequences ( Fig. 1). Sequence analysis indicated the presence of two tRNA genes, tRNA Pro and tRNA Gly , located at positions Ϫ223 and Ϫ372, respectively. The tRNA Pro is 90% identical to the tRNA Pro of humans, mice, and Caenorhabditis elegans. Positions 4 -53 of tRNA Gly are identical to the tRNA Gly of humans, C. elegans, and mice. The tRNA Pro is transcribed in the same direction as the U4 gene, whereas the tRNA Gly is divergently transcribed. The A boxes of tRNA Pro and tRNA Gly (5Ј-TAGTCTAGTGG-3Ј and 5Ј-TGGTCTAGTGG-3Ј, respectively) conform to the consensus box A sequence 5Ј-TRRYN-NAGTGG-3Ј (the most conserved positions are underlined). The B boxes of these tRNAs (5Ј-GTTCAATTCC-3Ј and 5Ј-GT-TCGATTCC-3Ј, respectively) are consistent with the box B consensus 5Ј-GTTCRANNCC-3Ј. Both tRNAs can fold into the canonical tRNA cloverleaf structure.
The genomic arrangement of the L. collosoma U4 locus does not resemble the loci of T. brucei U3 (15), L. collosoma U5 (5), and L. collosoma U6 (11,16,17); each of which has a divergently transcribed tRNA gene located 95-98 bp upstream of the U snRNA sequences (Fig. 2). We therefore examined a tRNAlike element upstream of the U4 gene, analogous to that described for U2 snRNA (10). Indeed, bona fide A and B boxes were found (5Ј-TCGCGGAGTGGG-3Ј and 5Ј-GGTTCGATCCC-3Ј, respectively), both of which agree with the consensus sequences (underlined). In comparing the consensus sequences with the corresponding boxes in the U2 locus (10), the A box of U2 is located at position Ϫ104 whereas the U4 A box is located at position Ϫ107. The U2 A box sequence (5Ј-TGGC-  CCGGGTGT-3Ј) deviates more from the consensus sequence than the U4 A box. The U4 box A contains all the highly conserved positions (underlined), whereas the U2 A box contains two deviations at positions 8 and 12 with G and T instead of A and G. The B box in the U2 locus (5Ј-GGTTCGAGCCT-3Ј) is located at position Ϫ155 and that of U4 is located at position Ϫ153; both the U2 and U4 B boxes agree well with the consensus. Thus, the tRNA-like boxes of U2 and U4 are positioned similarly with respect to the snRNA coding regions.
The U4 gene organization is unique when compared with other U snRNAs (Fig. 2), because it not only contains a tRNAlike element like the T. brucei U2 locus, but also carries two additional upstream tRNA genes, similar to U3, 7SL RNA, U6, and U5 snRNA gene loci. However, the companion tRNA located 95-98 bp upstream of the snRNAs is always divergently transcribed, whereas the tRNA gene adjacent to the tRNA-like sequence is transcribed from the same strand as the U4 gene. The presence of tRNA-like and tRNA sequences suggests that the former may have originated from a bona fide tRNA gene that had accumulated major changes and lost its identity. This tRNA may have been part of a tRNA cluster because most tRNA genes in trypanosomatids are clustered. In the case of 7SL, U5, U6, and U3, the identity of tRNA gene was kept, whereas in the case of U2 and U4, tRNA gene mutations accumulated but the identity of the A and B boxes was preserved including their proper spacing (Fig. 3A).
Because the tRNAs upstream of small RNA genes are transcribed (16,18), it was of interest to examine whether the tRNA-like gene is also transcribed. Primer extension analysis was performed using an oligonucleotide (26264), and a discrete extension product was obtained, suggesting the presence of tRNA-like RNA transcript (Fig. 3B). Comparison of the tRNAlike structure to the tRNA Pro secondary structure is presented in Fig. 3A. Deviations of the tRNA-like structure from the canonical are mainly in the D stem and TC stem-loops. The acceptor stem is extended at the 5Ј-end and the anti-codon loop is composed of 5 instead of 7 nt. However, the RNA can still be folded into the cloverleaf structure, and the A and B boxes are located in their conserved positions.
To examine the genomic organization of the U4 gene, L. collosoma DNA was subjected to Southern blot analysis. A single hybridization band was observed after digesting the DNA using restriction enzymes with 4-bp and 6-bp recognition sites, suggesting that the U4 gene, like all other U snRNA genes in trypanosomatids (2), is a single copy gene (Fig. 4A). Northern blot analysis indicates a single transcript of 114 nt (Fig. 4B).
The potential for base pair interactions between L. collosoma U4 and U6 is presented in Fig. 6. The first interacting domain (Stem II) is composed of 17 perfect base pairs as in T. brucei (2). This stem is more stable than in yeast, where 11 perfect base pairs are disrupted by a bulged C and followed by a perfect 5 base-paired duplex (23). Stem I is composed of a 9-bp duplex disrupted by two bulged nt in both L. collosoma and T. brucei.
The position of the bulged A in U4 snRNA is conserved in both trypanosomatid species, whereas the position of the bulged nt in the U6 duplex is different (the bulged A in T. brucei is located between positions 4 and 5 of the duplex, whereas the bulged U in L. collosoma is located between positions 6 and 7, Fig. 6). In yeast, stem I is composed of 8 perfect base-paired nt (23). Interestingly, the single sequence difference (U at position 57) between T. brucei and L. collosoma U4 RNAs is compensated for by the A at position 51, which is bulged in the T. brucei U4-U6 duplex.
The majority of changes between the trypanosomatid U4 RNAs are in the central domain and central loop (Fig. 6). The central domain and 3Ј-stem-loop in the yeast U4 RNA is quite tolerant to mutation (24). Interestingly, however, when the T. brucei stem-loop structure, which is homologous to the 3Ј-stemloop of the yeast RNA was used to replace this homologous domain in yeast, the chimeric RNA could not complement a null allele of the yeast U4 snRNA (25). This chimeric U4 RNA did not associate well with the U6 snRNA, suggesting that in addition to base pair interactions in stem I and II, other interactions are required to promote U4/U6 complex formation. It was found that only a single mutation in the 3Ј-stem-loop in yeast resulted in a cold-sensitive phenotype (24). This suggests that the overall structure of the 3Ј-stem-loop, rather than its particular sequence, may be important for the U4 function.
A third potential for base pair interactions between U4 and U6 exists in the Sm-like site of U4 positions 90 -95 (Fig. 6), similar to a more stable potential interaction for the T. brucei U4 snRNA at positions 86 -91 (also in the Sm-like site) (2). The significance of this potential is currently unknown. Recently, a phylogentically conserved stem structure (Stem III) was discovered, which has a counterpart in the highly diverged U4atac and U6atac snRNAs (26). In T. brucei this domain involves nt 22-27 of U6 and 86 -91 of U4 snRNA; the L. collosoma duplex would be formed by nt 90 -95 of the U4 and 24 -30 of the U6. It was suggested that at a particular stage of splicing stem III could sequester a specific stretch of the U6 before it base pairs with U2. The potential for interaction between the U6 and U2 (helix III) also exists for the trypanosome snRNAs (27). Interestingly, the U6 sequence block involved in the formation of the stem III between U6 and U4 is also involved in the formation of the helix III between U6 and U2. So far no genetic evidence exists to support these phylogentically conserved stem III interactions.
The central domain is the most variable region in the U4 molecule. Indeed, deletion of almost the entire region in yeast yielded only a mild cold-sensitive phenotype (24). Thus, the sequence of this domain may not be important except to serve as spacer that separates the 5Ј-and 3Ј-domains of the U4 RNA.
It has long been debated whether trypanosome snRNAs carry a bona fide Sm binding site. The L. collosoma U4 Sm-like site (positions 87-93) deviates from the canonical sequence, because the U stretch is disrupted by an A. Many trypanosomatid SL and snRNA Sm-like sites deviate from the canonical consensus. In L. collosoma U2 snRNA, the U stretch is disrupted by a C (27), whereas the U6 Sm site is very divergent with only a single U (18). The only two RNAs that carry canonical Sm sites are the SL RNA (28) and the U5 snRNA (5). However, this property is not shared with other trypanosomatids, for instance the T. brucei SL RNA and U5 Sm sites (4,29,30). By compiling all the Sm-like sites a consensus trypano-some Sm site can be derived: AAAN4G (where in 71% cases N is a U). Based on the strong deviation of these sites from the consensus Sm-like sequence and the finding that trypanosome proteins are not recognized by anti-Sm sera, it was proposed that trypanosome core proteins and their corresponding binding site may differ significantly from those in mammals. Affinity selection using antisense biotinylated oligonucleotides to T. brucei snRNP and SL RNP RNAs suggests that core proteins are shared among all these particles (31). Moreover, antibodies raised against these proteins immunoprecipitate all the spliceosomal RNAs including U6 and SL RNA (32). The recent finding of an SmE homologue in trypanosomes, which carries both Sm motifs 1 and 2, suggests that Sm proteins do exist in trypanosomes (7). Yet, the lack of recognition of these proteins by anti-Sm sera is intriguing. Changes in both the Sm-like site and the core proteins may have co-evolved. These deviations may stem from the need of these snRNAs to interact with different capping enzymes, because SL RNA has a unique cap structure and U5 unlike U2 and U4 does not possess a trimethylguanosine cap (5), whereas the U6 most probably possesses an inverted cap, like in all other eukaryotes (2).
Extragenic Sequence Elements Controlling the Expression of the U4 Gene-To investigate the elements that control expression of the U4 gene, the coding region was tagged in the internal loop III by inserting 6 bp between position 82 and 83 (Figs. 5 and 7). The marked gene carrying 210 bp of upstream sequence carrying the tRNA-like coding region and 91 bp of downstream sequence was cloned into the pX expression vector (Fig. 7A). RNA prepared from transfected cell lines was analyzed by primer extension using oligonucleotide (25000) complementary to the 3Ј-end of the U4 snRNA. The results indicate that the tagged gene was expressed efficiently (Fig. 7B, lane 2), suggesting that the tRNA-like element, but not the tRNA Pro and tRNA Gly , is important for the U4 expression. To further explore elements essential for the expression, we mutated the A and B boxes separately and established cell lines. Analysis of U4 expression (Fig. 7B) demonstrated that mutation of the A box had no effect, whereas mutation of the B box completely inactivated the gene. To control for gene dosage (plasmid copy number), the level of the neo mRNA encoded by the pX vector was examined by primer extension and found to be almost identical in all cell lines (Fig. 7B, bottom). This pattern of dependence on the tRNA-like extragenic A and B box elements is identical to that obtained for the U2 gene, except that mutation in the T. brucei box A also had a slight effect on expression (10). The T. brucei U2 B box was identified as a binding site for a nuclear protein (33). The protein might be involved in chromatin organization. Indeed, a B box located downstream to the yeast U6 coding region was shown to affect chromatin organization (34).
Mapping 2Ј-O-Methyl Modifications on the L. collosoma U4, U5, and U6 snRNAs-An important characteristic of U snR-NAs is their specific modification in regions highly conserved in evolution. In the vertebrate U4 and U6, modifications are clustered in the 5Ј-terminal region of U4 and the central region of U6 (35). Modifications are less numerous in yeast (36). We therefore examined the modifications of snRNAs from trypanosomes, which diverged much earlier than yeast from the eukaryotic lineage. The position of the 2Ј-O-methyl groups was determined by primer extension in the presence of low dNTP concentration (14), which causes the reverse transcriptase to pause one nt before the methylated site (13). The presence of specific stops in low (0.02 mM) but not high (5 mM) dNTP concentrations (Fig. 8) suggests 2Ј-O-methylation at positions A63 and A72 of U4. The A63 position is conserved in vertebrates and is the nearest nt to the potential stem I of the U4/U6 duplex. A72 is in the vicinity of U4/U6 stem III (see Fig. 6), but this position is not modified in verterbrates (35). In U6 snRNA 2Ј-O-methyls can be assigned to positions G42, G47, U50, C53, and U56, located in stem I and stem II of the U4/U6 duplex ( Figs. 6 and 8). The modified nt of trypanosome U6 RNA agree well with their positions in vertebrate U6 (35), where they appear in regions involved in intramolecular base pairing and are expected to be at or near the spliceosome catalytic center. Interestingly, U5 is modified (Fig. 8) in position A19 of the invariant loop, which is the one position that deviates from the canonical U5 invariant loop sequence (5). We have suggested that this position interacts with the SL RNA ϩ2 position by base pairing (4,5). Another U5 modification appears in position A28, which seems to be specific to trypanosome, because no modified nt were found in other eukaryotes downstream of the invariant loop.
Recent study has shown that 2Ј-O-methylation of vertebrate U6 snRNA is guided by box CϩD snoRNAs (37), suggesting that modification may take place in nucleolar or coiled bodies that have a close relationship with the nucleoli. Indeed the modification of U6 position U47 was shown to be guided by a guide RNA that also guides the methylation of 28 S rRNA. So far, no snoRNAs that have potential to guide modification of trypanosome U4 or U6 snRNAs have been discovered, although a guide that directs modification of 5.8 S rRNA has been described (19).