A major tyrosine-phosphorylated protein of Trypanosoma brucei is a nucleolar RNA-binding protein.

We have previously identified a set of tyrosine-phosphorylated proteins with apparent molecular masses of 44-46 kDa as some of the major tyrosine phosphorylated species in the protozoan parasite Trypanosoma brucei. We now show that these molecules, herein named Nopp44/46, are localized in the nucleolus. Using monoclonal antibodies, we have isolated Nopp44/46 cDNA clones from expression libraries. Sequence analysis reveals that the predicted amino acid sequence of the molecule is composed of an N-terminal unique region, an internal acidic region, and C-terminal repeat region. Analysis of the cDNA clones and genomic Southern analysis indicated that Nopp44/46 belongs to a multigene family in which different gene copies are very similar but vary in the number of repeats. Interestingly, the repetitive amino acid sequence motif contains multiple RGG (Arg-Gly-Gly) boxes characteristic of RNA-binding proteins. In vitro binding experiments demonstrated that Nopp44/46 is indeed capable of binding nucleic acids. Competition experiments with different RNA homopolymers demonstrated that Nopp44/46 preferentially binds to poly(U). These studies suggest that Nopp44/46 may play a role in RNA metabolism in trypanosomes and raise the possibility that tyrosine phosphorylation may regulate the process.

The processing, transport, and function of RNA are aided by numerous proteins. Many of these proteins are phosphoproteins, and indeed, recent evidence suggests that phosphorylation may play a role in regulating protein-RNA interactions. In contrast to higher eukaryotes, little is known about the expression and functions of RNA-binding proteins in the lower eukaryote, Trypanosoma brucei, an ancient representative of the eukaryotic kingdom. RNA processing in these organisms shows several novel features, including obligate trans-splicing rather than the cis-splicing common in most eukaryotes (1). trans-Splicing provides the novel cap structure present on mature mRNAs of T. brucei (it contains four modified nucleotides in lieu of the usual two) (2). Also unusual is the finding that protein-coding genes can be expressed from an rDNA promoter (3,4) and that the major surface proteins of the organisms appear to be transcribed by a polymerase I-like polymerase (5,6).
Studies of RNA-binding proteins have revealed that many of these proteins have specific motifs that contribute to the binding activity, including in some cases distinctive Arg-Gly-Gly (RGG) repeats. This RGG motif is defined as closely spaced RGG tripeptides interspersed with phenylalanine and/or aromatic amino acids. RGG boxes are present in the nucleolar RNA-binding proteins nucleolin (7), fibrillarin (8), and SSB-1 (9) as well as the hnRNP 1 proteins U (10), A1 (11), and p38 (12). In 1992, Ghisolfi et al. (13) demonstrated that this domain in nucleolin contains repeated ␤-turns and destabilizes RNA helices. Although RGG motifs are usually found in conjunction with other types of RNA binding domains, RGG boxes are capable of binding nucleic acids independently, at least in some cases. For example in hnRNP U protein, the RGG box is the only apparent RNA-binding element (10). Many RGG box RNAbinding proteins contain the modified amino acid N G ,N G -dimethylarginine (14), although the significance of this modification has not yet been determined.
In previous studies, we have characterized a set of T. brucei proteins, of apparent molecular mass of 44 -46 kDa (15). The abundance of these proteins in their tyrosine-phosphorylated state varies approximately 9-fold through the parasite's developmental cycle due to both increased stoichiometry of tyrosine phosphorylation and increased protein abundance. In addition, these phosphoproteins, previously designated pp44/46 (and herein renamed Nopp44/46), are serine-phosphorylated and associated with a tyrosine kinase activity upon immunoprecipitation. In this report, we first demonstrate that pp44/46 is localized within the nucleolus. We also provide evidence that pp44/46 is a member of the RGG box family of RNA-binding proteins. All of the members of pp44/46 multigene family contain RGG repeats though the number of repeats varies from one gene copy to another. The protein can selectively bind to poly(U) and to a lesser extent to poly(G). The localization, together with its nucleic acid binding characteristics, suggests that the protein may play a role in regulation of rRNA processing and maturation.
Immunoelectron Microscopy-Trypanosomes were processed at Ϫ20°C for embedding in Lowicryl K4M (Ted Pella, Redding, CA), and the resin polymerized by UV light at Ϫ35°C. Sections were cut with an RMC MT-7 ultramicrotome and labeled with monoclonal antibody ID2 at 2 g/ml and goat anti-mouse coupled to 10 nm gold (1:30, Amersham Corp.). Photographs were taken with a Zeiss EM 910 electron microscope.
Library Construction and Immunoscreening-RNA was isolated from bloodform and procyclic stages trypanosomes (strain IsTAR1) (18). The libraries were constructed using the Stratagene Unizap system for directionally cloning cDNAs into Zap. Approximately 1.25 ϫ 10 6 recombinant phage were obtained in each library. Amplified libraries were plated and induced for 5 h at 42°C with filters impregnated with 10 mM isopropyl-1-thio-␤-D-galactopyranoside. The lifts were screened with antibody ID2 at 2 g/ml. Reactive plaques were purified and the plasmids excised using helper phage as outlined by the manufacturer (Stratagene).
DNA Sequencing-The cDNA sequence was obtained by sequencing nested deletions generated by exonuclease III deletion (19). Both strands of the insert for pNopp44/46c1 were sequenced. For comparison purposes, one strand of the insert for pNopp44/46c2 was sequenced. Sequencing was carried out on an Applied Biosystems automated DNA sequencer.
Southern and Northern Analyses-For genomic Southern blots, 3 g of procyclic form DNA was digested with restriction endonucleases and separated on 1% agarose gels. DNA was transferred to nitrocellulose membranes, hybridized with a [ 32 P]UTP riboprobe overnight and washed at a final stringency of 0.1 ϫ SSC at 65°C (1 ϫ SSC is 150 mM NaCl, 15 mM sodium citrate). For Northern analyses, 10 g of total RNA was loaded onto formaldehyde-agarose (1.4%) gels and following electrophoresis, transferred to Nytran and hybridized to a riboprobe from pNopp44/46c1. Final washes were at the same stringencies as the Southern blots.
Oligonucleotides and Polymerase Chain Reaction (PCR) Amplification-PCR was performed using an annealing temperature of 50°C and extension at 72°C. PCR products were analyzed by Southern hybridization. Southern blots were hybridized with either randomly primed 32 P-labeled DNA fragments or [ 32 P]UTP riboprobes overnight and washed at a final stringency of 0.1 ϫ SSC at 65°C. The oligonucleotides used for PCR are indicated in Fig. 3. The T7-Nopp primer (5Ј-GACCTA-ATACGACTCACTATAGGGTCGGCAGCAATGGAGGGT-3Ј, start codon underlined) was used to add a T7 promoter to the Nopp44/46 coding region for purposes of in vitro translation. Following amplification of either genomic DNA or the c1 cDNA with T7-Nopp and stop primer (Fig. 3), approximately 1 g of product was used in in vitro transcription reactions using the Ampliscribe kit from Epicenter Technologies.
Oligonucleotide-directed RNase H Digestion of RNA and in Vitro Translation-20 g of procyclic RNA was incubated either in the presence or absence of 100 ng of oligonucleotide U3 at 42°C for 10 min. Then RNase H was added, and the incubation was continued for another 30 min at 37°C. RNAs were precipitated and each incubation mix was divided in half. One half was subjected to Northern analysis, while the other was translated in vitro using the rabbit reticulocyte system (Promega). 10 g of total RNA (control or RNase H-treated) or an aliquot of in vitro transcripts at various dilutions were used for in vitro translations. [ 35 S]Methionine was used to label the synthesized proteins as indicated. Translation products were immunoprecipitated with antibody ID2 (15) or with single-stranded DNA (ssDNA)-agarose (see below). The [ 35 S]methionine-labeled precipitates were separated on a 10% SDS-polyacrylamide gel, fluorographed, and analyzed by autoradiography.
Immunopurification of Nopp44/46 -Monoclonal antibody ID2 was conjugated to CNBr-Sepharose-4B (Pharmacia Biotech Inc.) according to the manufacturer's protocol. Procyclic forms were lysed in lysis buffer (15) and the clarified supernatant incubated with the ID2-Sepharose beads overnight at 4°C. The slurry was packed in a 1-ml column and washed with lysis buffer followed by two volumes of lysis buffer containing 1 M NaCl, and then with 0.1 M Tris-HCl (pH 7.5). The bound protein was eluted with 0.1 M glycine-HCl, 1 M NaCl (pH 2.5). The eluent was neutralized immediately with 1 M Tris-HCl (pH 8.0) and concentrated using an Amicon concentrator. The purified protein was analyzed by 10% SDS-polyacrylamide gel and transferred to membranes (Immobilon, Millipore Corp.). The Coomassie-stained band was submitted for compositional analysis. The amino acid composition was determined by the Protein Structure Laboratory at the University of California at Davis, where standards of monomethyl and dimethyl arginine were included. The N terminus of the protein was blocked.
Nucleic Acid Binding Assay-Procyclic form parasites were extracted with 10 mM Tris-HCl (pH 7.5), 2.5 mM MgCl 2 , 100 mM NaCl, and 0.5% Triton X-100 at 4°C for 30 min. The Triton X-100 lysate was centrifuged at 4°C at 12,000 rpm for 30 min. 4 g of the supernatant protein was incubated with 30 l of ssDNA-agarose at 4°C for 30 min. Beads were pelleted by brief centrifugation and washed four times with 1 ml of the buffer above. The bound material was eluted by boiling in sodium dodecyl sulfate (SDS) sample buffer and analyzed by Western blot using antibody ID2 (15). For competition assays, either RNA homopolymers, dsDNA, or ssDNA were also added during incubation with ssDNA agarose.

RESULTS
pp44/46 Is a Nucleolar Protein-In earlier studies we identified a set of phosphorylated proteins, pp44/46, whose abundance and tyrosine phosphorylation are regulated in the life cycle of T. brucei. To investigate the subcellular localization of pp44/46, we performed indirect immunofluorescence and immunoelectron microscopy using the monoclonal antibodies we previously generated. In immunofluorescence analysis of fixed cells, antibody ID2 recognized a bright spot that appeared to coincide with the nucleus as revealed by counterstaining with the DNA-specific dye 1,4-diazobicyclo-(2,2,2)octane, although the staining seemed more circumscribed and in an area of weaker 1,4-diazobicyclo-(2,2,2)octane staining. No staining of the compact mitochondrial DNA network was seen. To further investigate the subnuclear localization, we carried out immunogold labeling of procyclic form cells using antibody ID2. As shown in Fig. 1, the gold particles are seen directly over the nucleolus, confirming this as the subcellular localization of the antigen. The same results were obtained with a second anti-Nopp44/46 antibody, IIB3, and with bloodform stage cells (data not shown). Accordingly, we have renamed this set of molecules Nopp44/46 (nucleolar phosphoprotein of apparent molecular mass 44 -46 kDa).
Structure of the Nopp44/46 cDNA-Using antibody ID2, we screened a T. brucei Zap expression library and isolated four cDNA clones. All reacted with a second anti-Nopp44/46 antibody, IIB3, and excised plasmids had similar restriction maps. Two plasmids were characterized in detail (Fig. 2). The cDNAs were incomplete at the 5Ј end as no spliced leader (the 39nucleotide sequence trans-spliced to form the 5Ј end of all nuclearly encoded mRNAs) was found. The 5Ј end of the Nopp44/46 mRNA was obtained by cloning a fragment obtained by reverse transcription and PCR amplification, using a Nopp44/46-specific primer (primer 486; see Fig. 3 for primer locations) for cDNA synthesis, and a spliced leader primer plus a Nopp44/46-specific primer for amplification (primer 380). Five PCR clones were sequenced; all matched the cDNA clones and each other, and contained an additional 17 base pairs beyond the longest cDNA, at which point the splice leader was found.
The assembled nucleic acid and predicted protein sequence is shown in Fig. 3. The start codon is located 27 nucleotides following the spliced leader. No protein kinase consensus motifs were found. The predicted protein encoded by the cDNAs can be divided into three domains: a 169-amino acid unique region, an internal 51-amino acid acidic region composed almost exclusively of glutamic acid and aspartic acid, and a C-terminal repeat region. The repeat motifs fit the consensus (F/N/D)-R-G-G, with this particular cDNA containing 22 repeats. The repeats are somewhat degenerate, with 13 of 22 fitting the consensus and the remainder showing minor variation. The predicted molecular mass (34,748 Da) is smaller than that predicted from migration on SDS-PAGE. The aberrant migration appears to be a property of Nopp44/46 protein (see below).
To ensure that the clones encode the proper immunoreactive proteins, procyclic RNA was incubated with oligonucleotide U3 and then digested with RNase H to specifically cleave the RNA-DNA duplex. The RNAs were then analyzed by Northern analysis to verify cleavage. As shown in Fig. 4 (left panel), the probe detected a 1.6-kilobase message. When oligomer U3 was added along with RNase H, the 5Ј 520 nucleotides of the RNA were removed as predicted. Upon in vitro translation and subsequent immunoprecipitation, antibody ID2 precipitated Nopp44/46 from procyclic form RNA (Fig. 4, right panel). However, no Nopp44/46 was detected in the sample treated with both oligomer U3 and RNase H. These experiments confirm that we indeed cloned a Nopp44/46 cDNA. Characterization of Nopp44/46 Genes and cDNAs-To examine the organization of pp44/46 genes, genomic Southern analysis was carried out. A radiolabeled probe spanning the acidic and repeat region (indicated in Fig. 2) was hybridized to restriction enzyme digestions of genomic DNA (Fig. 5A). The pattern obtained suggests there are at least four copies in the diploid genome, either resulting from polymorphic alleles at two loci or from monomorphic alleles at multiple loci. PCR analysis of other strains of T. brucei showed fragments of slightly varying size indicating a high degree of polymorphism within the gene copies (data not shown), suggesting the former possibility is quite likely. DNA sequence analysis indicated that clones pNopp44/46c1 and c2 were likely derived from distinct loci, as their 3Ј untranslated regions diverge approximately 350 base pairs past the stop codon (marked by dashed line in Fig. 3).
To further investigate how the gene copies differ, unique, acidic, and repeat regions were amplified from genomic DNA. The PCR products were analyzed by Southern blot using a randomly primed probe derived from pNopp44/46c1. The radiolabeled probe detected three bands when the entire coding region was amplified from genomic DNA (Fig. 5B, lane 2). According to the intensity of hybridization, the middle band probably corresponds to two copies. Similarly, primers corresponding to the EcoRI and NsiI sites yielded three fragments, identically sized to those obtained from direct genomic Southern analysis (Fig. 5B, lane 1). Primers designed to amplify the unique plus acidic region or the acidic region alone (Fig. 5B,  lanes 3 and 4) yielded a single radiolabeled band from genomic DNA, indicating that these regions are very similar in all of these genes. On the other hand, amplification of the repeat region yielded a set of bands (three or four depending on the gel conditions) that reproducibly hybridized with the probe (Fig.  5B, lane 5), suggesting that gene copies have different numbers of repeats. The size of the bands suggests that the number of repeats may vary from approximately 11 to 29 in this particular strain of trypanosomes. Within our clonal line, the pattern is identical between subclones that have been independently propagated for several years (data not shown). Thus the repeats are not undergoing rapid expansion and contraction during clonal propagation.
The size of amplification product corresponding to the acidic region appeared identical in genomic DNA and three of the four cDNA clones (data not shown). The acidic region of pNopp44/ 46c1 clone was smaller, indicating that part has been deleted. Indeed, this cDNA was missing 13 codons found in pNopp44/ 46c2 (shown in lowercase in Fig. 3), which presumably reflects the true genomic sequence of the Nopp44/46 genes. The repeat region amplification products of clones c3 and c4 did not match those from genomic DNA, apparently having suffered from deletion or rearrangement during cloning. These results are perhaps not surprising, given the propensity for repeated sequences to be deleted in Escherichia coli. On the other hand, repeat region products from clones c1 and c2 precisely matched the larger two products from genomic DNA. Preliminary sequence analysis of clone c2 (i.e. one strand sequenced in entirety) showed that it has an additional 27 amino acids within the repeat region (ϳ7 repeats) as compared to clone c1. Aside from the artifactual deletion in the acidic region and differences in numbers of repeats, there are only a few single-base differences between clones c1 and c2 within the coding region. Taken together, our results indicate that Nopp44/46 proteins are encoded by a multigene family whose members differ primarily in the length of the repeat region.
Data Base Searches-Data base searches with the unique region did not reveal significant homology to any known protein. The acidic domain and the repeat region had homologies to a number of proteins, particularly nucleic acid-binding proteins. Two proteins, nucleolin and fibrillarin, had homology to both the acidic and repeat regions. Like Nopp44/46, these proteins are nucleolar. Nucleolin and fibrillarin are RNA-binding proteins, and both have Arg residues modified as N G ,N G -dimethylarginine. Compositional analysis of individual Nopp44 and Nopp46 bands did not reveal any indication of either dimethyl or monomethyl arginine. It did, however, reveal a highly biased composition rich in glycine and acidic amino acids as predicted by the sequence of the cloned cDNAs.
Nopp44/46 Binds to Nucleic Acids in Vitro-Nopp44/46 contains several RGG motifs, a characteristic feature of many RNA-binding proteins. It has been shown that many RNAbinding proteins bind to ssDNA in vitro (10, 20, 21). To examine the nucleic acid binding potential of Nopp44/46, cell lysates were incubated with ssDNA-agarose beads. Bound proteins were eluted from the beads with SDS sample buffer and tested for the presence of Nopp44/46 by Western blot. Fig. 6 shows that Nopp44/46 bound to ssDNA-agarose, but not to protein A-agarose. Immunopurified Nopp44/46 also bound to ssDNAagarose in vitro (data not shown).
Many RNA-binding proteins show a preference for certain homopolymers. We performed a competition assay testing RNA homopolymers for their ability to bind Nopp44/46. RNA homopolymers poly(A), poly(C), poly(G), and poly(U) were added as competitors to the incubation mixture. The binding of Nopp44/46 to ssDNA-agarose could be competed out very efficiently with soluble poly(U) and to a lesser extent by poly(G) (Fig. 7). However, poly(A) and poly(C) were ineffective competitors. Surprisingly, poly(G) at low concentrations reproducibly augmented the binding of Nopp44/46 to ssDNA. This unusual augmentation disappeared when the poly(G) was heated and quick-cooled just prior to the assay. Denatured poly(G) competed weakly for Nopp44/46 binding to ssDNA (Fig. 7). ssDNA and dsDNA also competed for Nopp44/46, with dsDNA showing the augmentation seen with poly G (Fig. 7). These results indicate that Nopp44/46 binds preferentially to poly(U).
To evaluate whether nucleic acid binding is an intrinsic property of Nopp44/46 molecule, the coding regions of the molecule were amplified from the c1 cDNA as well as from genomic DNA using a 5Ј primer attached to the T7 promoter and the stop primer. The PCR product was transcribed by T7 polymerase in vitro, and the RNA was then translated in the rabbit reticulocyte system. Fig. 8A shows the results of Western analysis. A band at approximately 44 kDa (and a faint band at about 46 kDa) in resulted from translation of genomic DNA reactions (lane 2), whereas a single band results from the cloned cDNA (lane 3). These translation products migrate very similarly to Nopp44/46 immunopurified from cell lysate (lane 1), demonstrating that the unusual migration on SDS-PAGE is a property of the polypeptide sequence. This discrepancy in apparent molecular mass may be due to the presence of long stretches of repeated amino acids, as has been seen with many other molecules (20,21). In related experiments, the in vitro transcribed and translated proteins were labeled with [ 35 S]methionine and tested for binding to ssDNA-agarose. Fig. 8B shows that the in vitro translated products bind ssDNA. DISCUSSION We report here the subcellular localization, cDNA cloning, sequencing, and nucleic acid binding activity of a set of tyrosine-phosphorylated proteins from T. brucei. Antibodies di- rected against these proteins previously shown to precipitate a protein kinase (15). However, the predicted amino acid sequence showed no protein kinase consensus motifs. In preliminary studies, we have been unable to detect kinase activity in the in vitro transcribed and translated products from clone c1. This indicates that the tyrosine kinase activity precipitating with Nopp44/46 antibodies is likely the result of an interaction of Nopp44/46 with another protein. As demonstrated by immunoelectron microscopy, the set of phosphorylated proteins we have studied is localized to the nucleolus; hence, we have renamed them Nopp44/46. To our knowledge, this is the first report of a nucleolar tyrosine-phosphorylated protein, although other tyrosine-phosphorylated RNA-binding proteins have been described (12,23,24). We cannot predict which of the 6 tyrosines and 17 serines in the Nopp44/46 protein are phospho-rylated, as no consensus phosphorylation sites have been defined for protein kinases in these divergent organisms. It is unlikely that a high stoichiometry tyrosine phosphorylation is required for nucleolar localization, since Nopp44/46 is nucleolar in slender bloodforms where it shows little tyrosine phosphorylation (15). The only other nucleolar protein that has been identified in T. brucei is fibrillarin, which is associated with an unusual small RNA, RNA B (25). This protein was detected using heterologous antibody and has not been analyzed at the DNA sequence level, although a Leishmania homologue has been identified at the DNA level (26).
On high resolution protein gel analysis of T.brucei lysates, three Nopp44/46 bands can be observed: a doublet at 44 kDa and a 46-kDa species. We have now shown that there are four gene copies, which differ slightly in the length of their coding regions. Accordingly, the different protein isoforms quite likely result from expression of different gene products. This is supported by the observation that three isoforms can also be observed following in vitro translation of cellular RNA. Nevertheless, while the 44-kDa and 46-kDa species are each phosphorylated on tyrosine and serine, it is possible that the different isoforms are differentially phosphorylated as well. In RNA-binding proteins in higher eukaryotes, a large portion of the Arg residues of the repeat motif are methylated. We were unable to obtain any evidence of the post-translational modification of arginine to N G ,N G -dimethylarginine in Nopp44/46. Perhaps, like Saccharyomyces cervisiae (27), these evolutionarily ancient organisms lack this enzymatic function.
The RGG repeat motifs and long stretches of acidic residues occur together in two known protein sequences in addition to Nopp44/46: nucleolin and fibrillarin. These proteins are nucleolar and bind RNA. For nucleolin it has been suggested that an intramolecular interaction occurs between this RGG domain and the rest of the protein to enhance nucleic acid binding activity (13). Also, in case of yeast RNA-binding protein, NPL3, it has been found that a C terminus rich in (F/Y)RGG motifs can mediate binding to ribohomopolymers (28). No other apparent RNA binding motifs were found in Nopp44/46, suggesting that the RGG repeats may be the sole mediators of RNA binding, as they are in hnRNP U.
Sequence analysis does not reveal an obvious nuclear localization signal in Nopp44/46. Regarding subnuclear localization, it is unclear as yet how proteins are directed to the nucleolus. It has been suggested that some proteins accumulate in the nucleolus by binding to other nucleolar proteins or to nucleic acids (29). C-terminal repeats of (F/Y)RGG are necessary for proper localization of certain yeast proteins to the nucleus. For example, the RGG domain of nucleolin is responsible for specific localization within nucleoli (20). By analogy, the repeated RGG repeats that likely function in RNA binding of Nopp44/46 may also be involved in nucleolar localization.
Nopp44/46 binds preferentially to poly(U) and weakly to poly(G), ssDNA, and dsDNA, but does not show significant binding affinity toward poly(A) or poly(C). This suggests that Nopp44/46 may preferentially bind U-rich sequences in vivo. Similar nucleic acid binding preferences are also observed in hnRNP proteins A1 and C (30), in Sxl proteins (31), and in an RNA-binding protein from tobacco (22). Whether the RNA binding activity of Nopp44/46 is modulated by its developmentally regulated changes in tyrosine-phosphorylation is not yet known. However, in other RNA-binding proteins, phosphorylation has been shown to be an important factor in regulating function. For example, protein phosphorylation regulates the in vitro assembly of the spliceosome (32, 33). Tyrosine phosphorylation of p38, an A/B type hnRNP protein has been reported to modulate its RNA binding (12). What role might Nopp44/46 play in nucleic acid metabolism? Since the nucleolus is the site of rRNA transcription, processing, and ribosome assembly, the most obvious speculation is that Nopp44/46 plays a role in one of these processes. Another, perhaps less likely, alternative is that Nopp44/46 could be involved in the expression of other trypanosome genes thought to be transcribed by RNA polymerase I on the basis of the resistance of transcription to ␣-amanitin. These genes encode the two major surface proteins of T. brucei, both of which are developmentally regulated. Further investigation to determine if Nopp44/46 binds specific RNAs will clarify which of these possibilities, if either, is correct.