Identification of the First Trypanosome H/ACA RNA That Guides Pseudouridine Formation on rRNA*

In trypanosomes small nucleolar RNA (snoRNA) genes are clustered, and the clusters encode for either single or multiple RNAs. We previously reported on a genomic locus in Leptomonas collosoma that encodes for multiple C/D snoRNAs whose expression is regulated at the processing level (Xu, Y., Liu, L., Lopez-Estran˜o, C., and Michaeli, S. (2001) J. Biol. Chem. 276, 14289–14298). In this study we have characterized, in the same genomic locus, the first trypanosome H/ACA RNA, which we termed h1. Having a length of 69 nucleotides, h1 has the potential to guide pseudouridylation on 28 S rRNA. The h1 is processed from a long polycistronic transcript that carries both the C/D and h1 snoRNAs. The h1/rRNA duplex obeys the rules for guiding pseudouridylation. Mapping of the pseudouridine site indicated that the predicted U is indeed modified. However, in contrast to all H/ACA RNAs, h1 consists of a single hairpin structure and is the shortest H/ACA RNA described so far. All RNAs undergo post-transcriptional site-specific modifications. The most common modifications are conversion of uridine to pseudouridine ( (cid:1) ) 1 and 2 (cid:2) - O- methylation of the back-bone ribose. The function of the modified nucleotides is currently unknown. Most ribosomal pseudouridines and 2 (cid:2) - O methyl groups are dispensable for

All RNAs undergo post-transcriptional site-specific modifications. The most common modifications are conversion of uridine to pseudouridine (⌿) 1 and 2Ј-O-methylation of the backbone ribose. The function of the modified nucleotides is currently unknown. Most ribosomal pseudouridines and 2Ј-Omethyl groups are dispensable for cell growth. However, both pseudouridines and 2Ј-O-methyl nucleotides are clustered around the functionally important regions in the RNAs, suggesting their importance. Site-specific pseudouridylation and 2Ј-O-methylation of rRNA is directed by snoRNAs in the nucleolus (1)(2)(3). The snoRNAs that guide 2Ј-O-methylation carry two conserved boxes: the C (5Ј-RUGAUGA-3Ј), where R represents a purine (A or G), and D (5Ј-CUGA-3Ј) boxes, which often form a 5Ј-3Ј terminal stem (1). Fibrillarin shares common motifs with known methyltransferases and is likely to be the enzyme that catalyzes the formation of the 2Ј-O-methyl nucleotide (4 -6). Many snoRNAs also carry internal CЈ and DЈ boxes. The D and/or DЈ boxes are preceded by 10 -21 nt that are a perfect match to the rRNA sequences. The modified nucleo-tide is always present 5 nt upstream to the D/DЈ box. This is known as the ϩ5 rule (1).
More relevant to this study are the snoRNAs that guide pseudouridylation. Structurally these snoRNAs consist of two hairpin domains connected by a single-stranded hinge, the H (ANANNA) domain, and a tail region, the ACA box. Two short rRNA recognition motifs of the snoRNA base pair with rRNA sequences flanking the uridine to be pseudouridylated. The ⌿ is always located 14 -16 nt upstream to the H or ACA box of the snoRNA (2,3,7).
Although the snoRNAs select the site to be modified on the target RNA, the snoRNP proteins may carry out the actual modification. Four H/ACA binding proteins have been identified: Gar1P, Cb5p (which shows striking structural similarities to pseudouridine synthase), Nhp2p, and Nop10p (5, 8 -10). So far none of these proteins were identified in trypanosomes. Interestingly a nucleolar protein was identified in Trypanosoma brucei that may play a role in RNA metabolism in the nucleolus (11).
In vertebrates many of the guide snoRNAs are encoded by introns of host genes that encode for proteins involved in ribosome biogenesis and function (6,12). In yeast, only a few snoRNAs are encoded by introns, and most of them are independently transcribed (13). The maturation of most of the intron-encoded snoRNAs involves debranching of the lariat and exonucleolytic trimming (14). The self-transcribed snoRNAs are processed from a precursor by endonucleolytic cleavage and exonucleolytic trimming (15). In yeast, two exonucleases, Rat1 and Xrn1, were shown to carry out 5Ј to 3Ј trimming (15)(16)(17), and the endonuclease that cleaves the snoRNA precursors carrying either H/ACA or C/D snoRNA is Rnt1, which is the yeast homologue of bacterial RNase III (18,19). A splicing-independent processing pathway that functions in processing clustered snoRNAs carrying both C/D and H/ACA snoRNAs operates in plants (20).
Trypanosomatids are protozoan parasites that diverged early in the eukaryotic lineage and possess unique RNA processing pathways such as trans-splicing and RNA editing. Trypanosome rRNAs undergo a nonconventional processing pathway that results in cleaving the 28 S rRNA into two large and six small rRNA fragments (21).
Very little is known about ribosome biogenesis and modification in trypanosomatids. However, C/D snoRNAs were characterized in several trypanosomatid species (22)(23)(24)(25)(26). The first study on trypanosome snoRNAs suggested that trypanosomes obey the ϩ5 rule for snoRNA-mediated methylation (22). However, studies carried out on snoRNAs that are located within the spliced leader-associated (SLA1) RNA loci in several trypanosomatid species suggested that trypanosomes do not obey the general methylation rules and indicated that the methylation site can have an alternate position located 6 or even 1 nt upstream to the D or DЈ box (23). Further studies of T. brucei analyzing 17 C/D snoRNAs identified by immunoprecipitation using antibodies raised against the T. brucei fibrillarin protein concluded that T. brucei obeys the ϩ5 rule (24). A more recent study performed on snoRNAs present in two clusters in Leptomonas collosoma suggested that the methylationguiding rule of trypanosomatid snoRNA is not unusual; also L. collosoma obeys the ϩ5 rule (26). As opposed to C/D snoRNAs, nothing is known about trypanosome H/ACA RNA.
Cloning and sequencing of trypanosomatid snoRNA genes suggest that the snoRNAs are organized in clusters that carry single or multiple RNAs (22,26). The snoRNA genes analyzed are transcribed as polycistronic RNAs (25,26) that are further processed to generate the mature RNAs. Expression studies on snoRNA-2, which encodes for a single C/D snoRNA, suggest that expression of the gene, when cloned into the pX neo episomal vector, requires at least a 20-bp flanking sequence. However, expression of the tagged gene, although at a lower level, was detected in the absence of an upstream sequence, suggesting the lack of a conventional promoter adjacent to the gene. The expression of the snoRNA genes, however, is dependent on the transcription from the upstream episomal neo gene. Data obtained from transcription in permeable cells suggest that snoRNA genes are transcribed by RNA polymerase II. Interestingly all C/D snoRNAs are flanked by sequences that form a perfect stem structure. The significance of this stem for the processing of the snoRNA is currently unknown (26).
In this study, we have cloned and sequenced the entire repeat that was recently described (26). Additional three C/D snoRNAs (B3, B4, and B5) and two copies of the first trypanosome H/ACA RNA, termed h1 RNA, were revealed. h1 is 69 nt long and can be folded into the canonical H/ACA RNA structure but consists of only a single hairpin structure rather than two hairpins connected by a single-stranded hinge and tail regions. However, h1 carries an AGA box at the 3Ј-end. Two short motifs of h1 in the internal loop base pair with the 28 S rRNA sequence flanking the target uridine. Mapping of the pseudouridines present in this region indicated that U 3643 on 28 S rRNA, the predicted site, is indeed pseudouridylated. The h1 is present on the same polycistronic transcript that encodes for the C/D snoRNAs. Like the C/D snoRNAs, h1 is flanked by a stem structure. This is the first report on a trypanosome H/ACA RNA. h1 is the shortest H/ACA RNA described so far.

EXPERIMENTAL PROCEDURES
Plasmid Construction and DNA Sequencing of the snoRNA Cluster-To clone the entire snoRNA repeat, DNA of g2 was digested with ClaI (complete and partial digestion) and subjected to Southern analysis with a B2-specific probe. A 2.2-kilobase repeat unit was cloned into pBluescript KSϩ, and the entire repeat was sequenced. The plasmid was termed g2ClaI.
Mapping the Position of the Pseudouridine on 28 S rRNA-Total RNA of L. collosoma was prepared with TRI Reagent (Sigma) from 5 ϫ 10 8 cells and was divided into two parts. Half of the RNA sample (100 g) was treated with 30 l of CMC (N-cyclohexyl-NЈ-␤-(4-methylmorpholinium)ethylcarbodiimide p-tosylate) at 37°C for 20 min. To remove CMC groups, the CMC-treated RNA was subjected to alkali hydrolysis in the presence of 50 mM Na 2 CO 3 (pH 10.4) at 37°C for 4 h as described previously (27). As a control, half of the RNA sample, untreated with CMC, was subjected to alkali hydrolysis under the same conditions mentioned above. The RNA (40 g) was used as templates in primer extension analysis. Primer extension was performed with end-labeled primer 44252 (100,000 cpm/pmol). After annealing at 60°C for 15 min, the sample was kept on ice for 1 min. Subsequently 1 unit of reverse transcriptase (Expand RT, Roche Molecular Biochemicals) and 1 unit of RNase inhibitor (Promega) were added, and extension was performed at 42°C for 90 min. The reaction was analyzed on an 8% polyacrylamide denaturing gel next to a RNA sequencing reaction performed with the same primer.
RT-PCR-Total RNA prepared using TRI Reagent (Sigma) was treated with DNase (RNase-free) (RQ1 from Promega) at 37°C for 1 h. After phenol extraction and ethanol precipitation, reverse transcription was performed on total RNA with the antisense primers (20406, 43391, and 43388). RNA samples were heated for 5 min at 95°C followed by annealing for 15 min at 65°C. After chilling on ice for 2 min, 1 unit of reverse transcriptase (Expand RT, Roche Molecular Biochemicals) and 1 unit of RNase inhibitor (Promega) were added, and the reaction was incubated at 42°C for 90 min. Next the cDNA was ethanol-precipitated, and one-tenth of the reaction was used in PCR amplification. The PCR was carried out with different primer pairs: 43289 -20406, 26556 -43388, and 44363-43391, specific to different regions of the cluster as shown in Fig. 5A. As a positive control, PCR products were generated with the same primers using the plasmid containing the entire repeat as a template. To control for the absence of chromosomal DNA contamination, RNA treated with RQ1 was used for the PCR. PCR was performed with Taq polymerase (TaKaRa).

RESULTS AND DISCUSSION
The Structure of the Genomic Locus Encoding Multiple snoRNAs-To further elucidate the genomic structure of the loci encoding for the clustered C/D snoRNAs (26), we subcloned a 2.2-kilobase ClaI fragment from a phage carrying C/D snoRNA gene and sequenced the entire repeat. Based on mapping the phage by Southern blot analysis, five repeats of the 2.2-kilobase ClaI fragment were identified (data not shown). Fig. 1A presents a schematic illustration of the g2 locus. Downstream to the last repeat we identified part of a gene encoding for CDC2-related kinase (CRK3). The part of the CRK3 we have sequenced shares 58.9% and 60% identity with the T. brucei (GenBank TM accession number X74617) and Leishmania major (GenBank TM accession number AF073381) genes, respectively.
The sequence of the entire repeat is presented in Fig. 1B. Examining the sequences presented in this repeat, we identified three additional C/D snoRNAs (their positions are schematically presented in Fig. 1, A and B). These snoRNAs were termed B3-B5. The potential for base pair interactions of these snoRNAs with their corresponding rRNA sites is illustrated in Fig. 1C. B3 can potentially guide the 2Ј-O-methylation of G 1261 on 28 S rRNA, B4 can potentially direct the methylation of U 4046 on 28 S rRNA, and B5 can direct the methylation on position G 2382 on 28 S rRNA. Interestingly these new C/D snoRNAs seem to guide a single site, unlike those presented in the same cluster identified previously that have dual target sites (26).
The Cluster Encodes for H/ACA RNA-In searching the sequence downstream to B5, we noticed the presence of a sequence that can potentially be folded into a hairpin structure analogous to half of a H/ACA RNA. We first examined whether such putative RNA exists in L. collosoma. An oligonucleotide (43392) complementary to the 3Ј-end of the RNA was used in Northern analysis, and a single transcript of 69 nt was revealed ( Fig. 2A). The ϩ1 position of the RNA was mapped by primer extension (Fig. 2B) and is marked in Fig. 1B. An additional minor stop was mapped at position Ϫ12 and may correspond to an endonucleolytic cleavage site that mediates the release of the RNA from a polycistronic transcript (see below). Since the folding of this RNA agrees well with the structure of H/ACA RNA, we next searched for the potential of a base pair interaction with rRNA sequences. The structure of h1 RNA as an H/ACA RNA is illustrated in Fig. 3A, and the potential interaction of h1 RNA with 28 S rRNA is illustrated in Fig. 3B. The interaction agrees well with the rules for creating the pseudouridylation pocket. The proposed pseudouridine is always separated from the 3Ј-box by 14 -16 nt. In the case of h1, 15 nt separates the U from the AGA box. The presence of an AGA box instead of an ACA box deviates from the canonical rules. However, this feature is not unprecedented, since many yeast and Tetrahymena RNAs carry AGA, AUA, or AAA 3Јboxes (2,28). Two duplexes flanking the proposed pseudouridine exist; the duplex toward the 3Ј-end of the h1 is 6 bp long, and the duplex at the opposite side is 3 bp long. However, in contrast to all H/ACA RNAs described so far, h1 consists of a single hairpin structure. Interestingly we have recently demonstrated that SLA1, an RNA that was discovered by virtue of its efficient cross-linking to the spliced leader RNA, can potentially guide pseudouridine formation at position Ϫ12 relative to the 5Ј splice site of the spliced leader RNA. 2 In all trypanosomatids tested, the Ϫ12 position of the spliced leader RNA is a U and is always pseudouridylated. SLA1 can be folded as an H/ACA RNA that also carries a single hairpin structure. Like h1, SLA1 carries an AGA box but not an ACA box.
Interestingly there is potential to form a duplex between the sequences flanking the 5Ј-and 3Ј-ends of the h1 RNA. Such a proposed structure can also be formed with the sequences flanking all the C/D snoRNAs we described so far, including snoRNA-2, B2, and G2 (26) as well as B3, B4, and B5 as presented in Fig. 3C. We have proposed that this structure may serve as a signal for endonucleolytic cleavage of the polycistronic transcripts. This may imply that h1 and C/D snoRNA are processed by the same machinery.
h1 Can Potentially Guide the Pseudouridylation on Position 3643 on 28 S rRNA and Is a Homologue of Yeast snR34 -To examine whether the predicted U is indeed pseudouridylated, total RNA from L. collosoma was treated with CMC, and primer extension was performed with an oligonucleotide complementary to the 28 S rRNA downstream to the predicted site. The pseudouridine reacts with CMC at its N-3 position, and this modified base then creates an obstacle for the reverse transcriptase. The reverse transcriptase stops should be located 1 nt before the pseudouridine, whereas there will be no stop in the control reaction. Indeed a stop was observed 1 nt before U 3643 (Fig. 4). An additional pseudouridine at position U 3628 was detected in the same experiment. Interestingly these two pseudouridines are conserved from yeast to human (29). The modification of the U 3643 (U 2876 in yeast) is guided by snR34 in yeast (2,3). However, the duplexes formed between snR34 and the target rRNA differ in size. The yeast 5Ј-duplex (with respect to the 5Ј-end of snoRNA) is 8 bp, whereas the 3Ј-duplex is 5 bp (2). In h1, the duplexes are 3 bp (relative to h1 5Ј-end) and 6 bp (opposite side).
The h1 Is Present on a Polycistronic RNA That Carries Both C/D and h1 RNA-Since we have previously demonstrated that the C/D snoRNAs present in the g2 locus are transcribed as a polycistronic RNA by RNA polymerase II and because we identified sequences that can potentially form a stem flanking the h1 coding region that could serve as signals for endonucleolytic cleavage, we examined whether h1 is also present on a polycistronic transcript carrying both the C/D and h1 RNA. To this end, a RT-PCR assay was performed with different sets of oligonucleotides as indicated in Fig. 5A. Fig. 5B 2. A, Northern analysis of h1 RNA. Total L. collosoma RNA (30 g) was separated on a denaturing gel and subjected to Northern analysis with oligonucleotide 43392 as described previously (37). The marker (M) was a pBR322 HpaII digest. B, primer extension of h1 RNA. Primer extension (PE) was performed using the end-labeled oligonucleotide 43392. The products of the sequencing reactions with g2ClaI were used as a reference. The sequence of the cDNA is indicated, and the major stop at the ϩ1 position is marked with a black arrow. The minor extension product at position Ϫ12 is indicated with an open arrow.
results of the RT-PCR with three sets of primers. To avoid DNA contamination the RNA sample was extensively treated with DNase (RNase-free). To ensure that the RNA sample was free of DNA contamination, PCR was carried out without reverse transcription, and in this case no product was detected (Fig. 5B,  lanes 2, 5, and 8). As a positive control, PCR was carried out on plasmid carrying the entire repeat (Fig. 5B, lanes 1, 4, and 7). In the first set of experiments, we examined the presence of B3 to B5 on the same transcription unit with the upstream snoRNAs. The cDNA was synthesized using an oligonucleotide complementary to the B5, and the cDNA was amplified using a sense oligonucleotide in the intergenic region between TS1 and G2. The expected product of 800 bp was obtained (Fig. 5B, lane  3). In the second and third sets, the presence of h1 on the polycistronic transcript carrying the C/D snoRNA was examined. Oligonucleotide complementary to B2, situated downstream to h1, was used to produce cDNA, and the cDNA was amplified with oligonucleotide upstream from the h1 genes. The expected product of 450 bp was produced (Fig. 5B, lane 6). cDNA produced with oligonucleotide located in the intergenic region between the two copies of h1 was amplified with oligonucleotide in the coding region of h1, and a product of 230 bp was obtained (Fig. 5B, lane 9), indicating that two h1 snoRNAs are part of the polycistronic transcript. The difference in the intensity of the RT-PCR reflects the length of transcripts to be extended. For example, weak PCR products were obtained for fragments in the range of 800 and 450 bp (Fig. 5B, lanes 3 and  6), but a strong signal was obtained for the 230-bp fragment (Fig. 5B, lane 9). B, separation of the RT-PCR and PCR products on a 1% agarose gel. In lanes 1, 4, and 7 are PCR products derived from primer sets (a, b, and c indicated in A) using the repeat DNA unit as template (positive control). In lanes 2, 5, and 8 (negative control) the template for PCR was DNase I-treated RNA but without reverse transcription using the same set of primers (a, b, and c). Lanes 3, 6, and 9, the RT-PCR performed on the same DNase I-treated RNA with the set of primers as described above. M is the 1-kilobase (kb) DNA ladder. The arrows indicate the RT-PCR products.
The genomic organization and polycistronic transcription unit carrying both the C/D and H/ACA snoRNA described in this study mostly resemble the organization of snoRNA genes in plants where both C/D and H/ACA RNA are transcribed from a common upstream promoter. The plant snoRNAs are most probably processed by endonucleolytic activity followed by trimming (30). In vertebrates, snoRNAs (both C/D and H/ACA) are encoded by introns of host genes involved in ribosomal biogenesis and function and are processed from the debranched lariat by exonucleolytic trimming (31). A minor pathway also exists that is splicing-independent and involves cleavage within the pre-mRNA (32). In yeast, as in plants and trypanosomes, there are independent genes that encode for two up to seven snoRNAs. In yeast these polycistronic transcripts are processed by the endonuclease Rnt1p and are degraded by the 5Ј-to 3Ј-exonucleases Rat1p and Xrn1p (15)(16)(17). So far there is no evidence for the existence of such enzymatic activities in trypanosomes that are involved in degrading the snoRNA precursors.
A clue to understanding the mode of processing in the snoRNA cluster described here is the presence of a conserved double-stranded stem that can potentially be formed by the sequences flanking the snoRNA coding region (presented in Fig. 3C). Interestingly this structure also flanks the h1 RNA, suggesting that the same endonuclease may cleave the C/D and H/ACA snoRNA flanking sequences.
The next step in elucidating the function of snoRNAs in trypanosomes would be to disrupt or modulate the snoRNA function. We have engineered a B2 snoRNA deleted in 1 nt at the region complementary to the rRNA immediately upstream to the D box. Such an engineered snoRNA was efficiently expressed in transgenic parasites but failed to direct modification on a novel rRNA site (33). The failure to generate a new modification site may stem from competition between the authentic snoRNA and the ectopic one, leading to the exclusion of the ectopic snoRNA from the rRNA substrate. Another possibility is that the chromosomal location of the snoRNA affects the transport and delivery of the RNA from the nucleus to the nucleolus where these snoRNAs function, therefore the modification by the plasmid-encoded snoRNA failed to take place.
The h1 RNA is so far the shortest H/ACA-like RNA that was reported. By being such a short H/ACA RNA, it joins a large group of trypanosome small RNAs that are shorter than their counterparts in other eukaryotes including U1, U2, U4, and U5 small nuclear RNAs (34 -38). h1 can be visualized as a halfcanonical H/ACA snoRNA since it carries a single hairpin domain and has the potential to guide pseudouridylation on a single site. h1 is mostly related to the trypanosomatid SLA1 since we have recently demonstrated that SLA1 guides pseudouridylation on position Ϫ12 (relative to the 5Ј splice site) of the spliced leader RNA. SLA1, like h1, consists of a single hairpin and carries an AGA but not an ACA box at the 3Ј-end of the molecule. 2 A description of additional trypanosome H/ACA-like RNAs would make it possible to determine whether all trypanosome H/ACA RNAs possess a single hairpin structure, a property that differentiates them from the canonical structure established so far in plants, vertebrates, yeast, and ciliates. Interestingly we could not detect h1 RNA homologues in T. brucei and L. major by Northern analysis or primer extension. This is not that surprising since the L. collosoma C/D snoRNA-2 is related to its T. brucei homologue only in the regions that are complementary to rRNA sequences, but no sequence similarity exists outside these domains (22,24). The origin of the primordial C/D snoRNA and H/ACA RNA is unknown, but it was proposed that these RNAs may have originated from rRNA or even tRNA that acquired the ability to function as trans-acting cofactors (39). The appearance of these snoRNAs most probably took place early in eukaryotic evolution since trypanosomes are ancient eukaryotes and yet possess both 2Ј-O-methylation and pseudouridylation guide snoRNAs.