Identities and Phylogenetic Comparisons of Posttranscriptional Modifications in 16 S Ribosomal RNA from Haloferax volcanii *

Small subunit (16 S) rRNA from the archaeonHaloferax volcanii, for which sites of modification were previously reported, was examined using mass spectrometry. A census of all modified residues was taken by liquid chromatography/electrospray ionization-mass spectrometry analysis of a total nucleoside digest of the rRNA. Following rRNA hydrolysis by RNase T1, accurate molecular mass values of oligonucleotide products were measured using liquid chromatography/electrospray ionization-mass spectrometry and compared with values predicted from the corresponding gene sequence. Three modified nucleosides, distributed over four conserved sites in the decoding region of the molecule, were characterized: 3-(3-amino-3-carboxypropyl)uridine-966,N 6-methyladenosine-1501, andN 6,N 6-dimethyladenosine-1518 and -1519 (all Escherichia coli numbering). Nucleoside 3-(3-amino-3-carboxypropyl)uridine, previously unknown in rRNA, occurs at a highly conserved site of modification in all three evolutionary domains but for which no structural assignment in archaea has been previously reported. NucleosideN 6-methyladenosine, not previously placed in archaeal rRNAs, frequently occurs at the analogous location in eukaryotic small subunit rRNA but not in bacteria.H. volcanii small subunit rRNA appears to reflect the phenotypically low modification level in the Crenarchaeota kingdom and is the only cytoplasmic small subunit rRNA shown to lack pseudouridine.

Although posttranscriptional modification of RNA in general serves to modulate regional structural features (1,2), the observed and often conserved patterns of modification (mononucleotide structure and sequence location) result from the dual influences of phylogenetic position and environmental factors such as temperature of growth. For example, modifications used for structural stabilization in tRNAs of bacterial thermophiles exhibit notable differences compared with those found in archaeal thermophiles (3,4). Understanding of the relative importance of these influences and of the functional roles of these modifications at the single nucleotide level requires detailed and accurate knowledge of modification sites in a suitable number of diverse organisms. A more narrowly focused question is the extent to which the modification site, but not necessarily the modified nucleotide structure, may be con-served. Such instances may point to functionally important sites of RNA and to the molecular mechanisms served by modification; the best example of this is the influence of modification in the first position of the anticodon in tRNA on the regulation of codon recognition (5). Knowledge of detailed rRNA modification sites also ultimately bears on a number of related issues, including, for example, the number and sequence specificity of modification enzymes (see discussion in Ref. 6), the utilization (or not) of small nucleolar RNAs as guides for ribose methylation (7,8) or other modifications (6), and the extent to which RNA modification systems may be subject to lateral gene transfer, a prospect first raised by Woese et al. (9).
Whereas information concerning modification sites is readily available for tRNA as a consequence of the relatively large number of reported tRNA sequences (10), much less is known concerning the diversity of ribosomal RNA modifications (11), in part because of the greater experimental complexity of mapping modification sites in large RNAs (discussed in Refs. 8 and 12) compared with tRNA. Reliable rRNA data consists of complete or extensive modification maps of SSU 1 rRNAs from Xenopus laevis, yeast, human (11), and Escherichia coli (Ref. 13 and references therein) and LSU rRNAs of human (11) and E. coli (Ref. 14 and references therein) and extensive surveys of pseudouridylation sites in rRNA (6), particularly those from the LSU (15). To these studies can be added a number of early reports, e.g. Ref. 16, especially those by Woese and co-workers (17,18) of modifications in RNase T 1 fragments obtained in the course of phylogenetic cataloging, even though the structural identities of modified residues, as well as placement of the modified T 1 fragment in the RNA sequence (which predated the now common availability of rDNA sequences), were not known in many cases.
We report here an investigation of the modified sites in 16 S rRNA of Haloferax volcanii using LC/ESI-MS. The locations of five modified sites had earlier been reported based on partial RNA sequencing in conjunction with oligonucleotide cataloging (19). Of particular interest was modification at position 966 (E. coli numbering), inferred from the corresponding gene sequence to be a U derivative, a conserved site of modification in the decoding region of the rRNA that has been cross-linked to C-32 of the anticodon loop of P-site tRNA (20). In various eukaryotes, including human (21), X. laevis (21), and yeast (21,22), the modified residue has been established as 1-methyl-3-(3-amino-3-carboxypropyl)pseudouridine, a hypermodified nucleoside (23) regarded as phenotypically eukaryotic (11,12). In E. coli, however, modification in this region occurs as N 2 -meth-ylguanosine-966, 5-methylcytidine-967 (24), with no apparent structural correlation with the eukaryotic modification. No modification structure assignments in any archaeal SSU rRNA have previously been made at this site although the occurrence of modification has been documented in Sulfolobus solfataricus (25,26) and in RNase T 1 catalogs showing the appropriate conserved sequence for this loop (NCAACG) in Sulfolobus acidocaldarius, Thermoproteus tenax (25), Methanococcus jannaschii (27), and in fifteen other methanogens (17); the modification motif is thus highly conserved.

EXPERIMENTAL PROCEDURES
Isolation and Enzymatic Hydrolysis of rRNA-H. volcanii (ATCC 29605) was grown as reported earlier (28). 30 S ribosomal subunits were prepared (29), and the 16 S rRNA was isolated by extraction with phenol and chloroform (29). Purity of the RNA was assessed by gel electrophoresis (1% agarose) using an Applied Biosystems 230A micropreparative electrophoresis system. 16 S rRNA was hydrolyzed to nucleosides using nuclease P 1 (Sigma), venom phosphodiesterase I LC/ESI-MS of Nucleosides from Total Digestion of H. volcanii 16 S rRNA-A Quattro II mass spectrometer with MassLynx version 3.1 data system (Micromass, Beverly, MA) interfaced to an HP 1090 liquid chromatograph with diode array detector (Hewlett-Packard, Palo Alto, CA) was used for all LC/MS studies. Two hundred picomoles (ϳ100 g) of rRNA hydrolysate was injected directly onto a 250 ϫ 2.1-mm Supelco LC-18S column fitted with a matching 15 ϫ 2.1-mm precolumn (Supelco, Bellefonte, PA). The column was eluted at a flow rate of 300 l/min using an ammonium acetate/acetonitrile gradient as described previously (31), except the concentration of ammonium acetate was decreased to 0.005 M for compatibility with electrospray ionization. Diode array UV absorbance data were acquired from 240 -320 nm.
The chromatographic effluent was conducted without splitting into the mass spectrometer, using the standard megaflow inlet. The ion source was 180°C. Capillary and lens voltages were 3.1 and 0.24 kV, respectively, for measurement of positive ions. Data were acquired in "centroid" mode over the mass range 105-450 in 0.9 s (with a 0.1-s interscan delay, for a cycle time of 1 s). The "Cluster" algorithm from the MassLynx software was used to interrogate the data set for the presence of unknown nucleosides; cluster values were 132 and 146 units for normal and 2Ј-O-methylated nucleosides, respectively.
LC/ESI-MS of Oligonucleotides from RNase T 1 Digests of H. volcanii 16 S rRNA-The mass spectrometer and liquid chromatograph are described in the preceding section; a Z-spray interface was available for these studies. Fifty pmol (ϳ25 g) of rRNA hydrolysate was injected directly onto a 300 ϫ 1-mm Supelco LC-18S column (Supelco) with a 15 ϫ 1-mm OptiGuard C-18 precolumn cartridge (Optimize Technologies, Oregon City, OR). The solvent system consisted of 0.8 M 1,1,1,3,3,3hexafluoro-2-propanol (J. T. Baker Inc.) adjusted to pH 7.0 with triethylamine, half of which was diluted 1:1 with HPLC-grade water (Buffer A) whereas the other half was diluted 1:1 with methanol (Burdick and Jackson, Muskegon, MI) (Buffer B) (32). The column was eluted using a linear gradient of 0 -100% Buffer B in Buffer A plus Buffer B over 50 min at a flow rate of 60 l/min. Diode array UV data were acquired from 240 -320 nm.
The chromatographic effluent was conducted without splitting into the mass spectrometer. The ion source and desolvation temperatures were 140°C and 300°C, respectively. Capillary and lens voltages were Ϫ2.75 and 0.50 kV, respectively, for measurement of negative ions. Two alternating scan functions were used for data acquisition. The first one, used to determine oligonucleotide molecular masses, utilized a 44-V cone setting, a typical value for generating mass spectra with minimal fragmentation. Data were acquired in continuum mode over the mass range 480 -1380 in 3 s (with a 0.2-s interscan delay, for a cycle time of 3.2 s). The second scan function utilized a cone setting of 130 V to fragment the oligonucleotides. Data were acquired in continuum mode over the mass range 100 -350 in 0.4 s (with a 0.1-s interscan delay, for a cycle time of 0.5 s).

RESULTS
The posttranscriptional modification status of H. volcanii rRNA was examined using a combination of LC/MS-based methods (33) involving analysis of mixtures of nucleosides produced by total enzymatic hydrolysis (31) and of oligonucleotides from RNase T 1 digestion. The latter analysis, carried out directly on the total rRNA digest, provides accurate molecular mass values for oligonucleotide products, which can in turn be converted to base compositions (34) and correlated with specific oligonucleotide sequences in the rRNA through comparison with the corresponding gene sequence (35).
A chromatogram based on UV detection from LC/MS analysis of a total nucleoside digest of 16 S rRNA is shown in Fig. 1 and indicates the presence of three modified nucleosides, acp 3 U, m 6 A, and m 2 6 A. The assignments shown are based on HPLC retention times compared with tabulated values for RNA nucleoside standards (31) and on mass values for the protonated molecule and the protonated base produced as a fragment ion (31). The molar ratio of m 2 6 A:m 6 A is approximately 2, based on chromatographic peak areas by UV detection using mixture standards (data not shown). The presence of acp 3 U was unexpected, but of particular interest because it previously was known to occur only in tRNA (36). Both the masses of the protonated nucleoside and base (346 and 214, respectively) and the relatively early retention time of 6.4 min are highly distinctive compared with values for other RNA nucleosides (31,36). Examination of the mass spectra recorded during HPLC elution of A, U, G, and C provided no evidence for additional modified nucleosides that might have co-eluted with the major nucleosides and not have been evident in the UV To assign each modified nucleoside to the rRNA sequence, molecular masses of oligonucleotides from an RNase T 1 digest of the 16 S rRNA were determined by LC/ESI-MS for comparison with calculated masses of (unmodified) T 1 oligonucleotides predicted from the gene sequence (35). Shown in Fig. 2 are the traces for absorbance at 260 nm (extracted from photodiode array data; panel A) and for base fragment ions of the modified nucleosides (acp 3 U, m 6 A, m 2 6 A) expected from results of the initial modification screen shown in Fig. 1 (extracted from the high cone voltage scan function; panels B-D). The chromatographic profiles generated by these three ions (m/z 212, 148, 162) thus mark the elution times of oligonucleotides that contain them. These mass channels were time aligned with total ion current profiles from a normal cone voltage scan function, recorded in the same analysis, from which full mass spectra of the modified oligonucleotides were derived. Shown in Fig. 3 are five summed mass spectra across the apex of the peak eluting at 25.8 min (Fig. 2, B and C), which is expected to include m 6 A-and acp 3 U-containing T 1 oligonucleotides. Although the coincident elution times of the two base fragment ions suggests that they may belong to the same oligonucleotide component, the corresponding molecular mass values dictate their presence in different oligonucleotides, as follows. Comparison of the measured masses of the three oligonucleotides A, B, and C with all masses predicted from the gene sequence (19) allowed component B (M r 2574.5) to be readily assigned as unmodified 1093-UACAUUAGp-1100 (calculated M r 2574.5). The relative masses of the remaining oli-gonucleotides (A, M r 2305.5 and C, M r 2673.8) are not present in the calculated T 1 catalog, so each one contains one of the two modified nucleotides. To derive the T 1 oligonucleotide to which each belongs, the residue mass of each of the modified residues (14 Da for methyl in m 6 A; 101 Da for aminocarboxypropyl in acp 3 U) was subtracted in turn from the molecular masses of oligonucleotides A and C. Allowable T 1 compositions were obtained only for M r 2305.5 Ϫ 14 and for M r 2673.8 Ϫ 101. The m 6 A is therefore confined to the oligonucleotide UAACAGp, whereas the acp 3 U can be accommodated within either of two (A 3 ,C 3 ,U)Gp sequences. The base composition of the acp 3 Ucontaining RNase T 1 fragment inferred from the measured molecular mass was independently confirmed by isolation of the 8-mer oligonucleotide by anion exchange and reversedphase chromatographies (35), digestion to nucleosides, and LC/MS analysis. These results (data not shown) confirm the composition acp 3 U plus (A 3 ,C 3 )GP (derived from chromatographic peak heights) with no unmodified uridine. The published sequence (19) indicated unspecified modified A and U in the sequences UAACAAGp and ACUCAACGp, respectively; the mass data in the present study define the corresponding modified oligonucleotides as 1498-UAm 6 ACAAGp-1505 and 964-ACacp 3 UCAACGp-971 (E. coli numbering).
The remaining modified nucleotide, m 2 6 A, is present in an oligonucleotide component eluting at 30.7 min (Fig. 2D); five summed mass spectra spanning the apex of this peak are shown in Fig. 4. Subtraction of the modification element CH 3 ϫ 2 (28 Da for one m 2 6 A) from the indicated molecular mass of the single oligonucleotide in this peak does not yield a mass value allowed from the gene sequence-based T 1 catalog. Subtraction of CH 3 ϫ 4 (56 Da), however, yields an allowed composition of (A 2 ,C,U 2 )Gp, represented in five oligonucleotides in the RNA. Reference to the original RNase T 1 catalogs (19) allows assignment of this modified oligonucleotide to one of three AAUCUGp oligonucleotides: 1518-m 2 6 Am 2 6 AUCUGp-1523 (E. coli numbering). In summary, the experiments described revealed the presence of three posttranscriptionally modified species, each of which was localized to a specific sequence location within four sites in the RNA. Structure assignments and sequence locations (E. coli numbering) are summarized in Table I. No evidence was found for an additional modified C assigned from RNase catalog data, 1401-GCCCGp-1405 (19) (see "Discussion"), as a result of two key experiments: failure to observe additional modified nucleoside(s) in the total nucleoside digest (Fig. 1) and failure to find additional base ions released during analysis of RNase T 1 digestion products as in Fig. 2, B-D. Specific attention was paid to mass values corresponding to the known modified C bases in RNA (36) and to any other low mass fragment ions that could be candidates for new bases of unknown or unexpected structure, which would have been revealed in the analysis shown in Fig. 2. In addition, no oligonucleotide molecular mass values were found within the Ն3-mer products (represented by Fig. 2A) that were unassignable based on expected RNA masses calculated from the gene sequence. However, this latter experiment alone is not considered conclusive because of the difficulty in making M r measurements in the (mass spectrally) complex 3-and 4-mer elution region in the chromatogram. Finally, no modified nucleosides or sites were found that might have evaded detection in small oligonucleotides (e.g. NGp) from partial RNase sequencing and cataloging (19). DISCUSSION In the present study three different modified nucleotide species were structurally identified and placed at four sites in the 16 S rRNA sequence. These modification sites (Table I) correlate with those reported from RNase T 1 catalogs in conjunction with the corresponding gene sequence (19), but a modified cytidine (shown in the reported sequence as 1401-GCCCG-1405 (19)) was not found in the total nucleoside digest, as base fragment ion in LC/MS analysis of the RNase T 1 digest, or in oligonucleotide molecular masses reflecting incremental mass additions to M r 1278.8 (unmodified CCCGp). Were such a modified C present (and sufficiently stable to survive isolation and digestion protocols), it would have to elute underneath one of the seven nucleosides apparent in Fig. 1, would have to have a base fragment coincident in mass with a limited number of ubiquitous sugar-phosphate backbone fragment ions (present in significant excess in the analysis shown in Fig. 2), and in any instance would have a previously unknown structure.
RNase T 1 catalogs of SSU rRNAs have been published by Woese and co-workers for S. solfataricus, S. acidocaldarius, and T. tenax (25), for M. jannaschii (27), and for 15 additional methanogens (17) in addition to H. volcanii (19). In all of these archaea the position analogous to 966 is designated as an unknown modified residue N in the conserved sequence 966-NCAACG. The extent to which the assignment of N-966 as acp 3 U is common to archaea must await study of a broader range of organisms.
In tRNA, the side chain of nucleoside acp 3 U was found to be biosynthesized from S-adenosylmethionine, based on 14 C and 3 H labeling (42). Although the occurrence of acp 3 U in tRNA, and possibly rRNA, is conserved, its biological function is presently unknown. The effect of side chain substitution by the 3-amino-3-carboxy moiety at N-3 of uridine was studied by nuclear magnetic resonance spectroscopy and found to result in only a small influence on the C3Ј-endo/C2Ј-endo ribose conformer population (43). However, evidence from nuclear magnetic resonance spectroscopy that acp 3 U binds Mg 2ϩ in small oligonucleotides suggests a possible role for acp 3 U in the stabilization of regional RNA structure (44).
Assignment of m 6 A in the sequence 1498-UAm 6 ACAAGp-1505 constitutes the first known sequence placement of m 6 A in archaeal rRNA; its presence at a level of 1-2 residues was previously reported in S. solfataricus 16 S rRNA (39). The only previously established placement of m 6 A in SSU rRNA in other phylogenetic domains was in eukaryotes: X. laevis and human (11) and Rattus norvegicus (16,45), at positions analogous to that in which it occurs in H. volcanii, in the highly conserved mRNA decoding region of the SSU rRNA. Interestingly, this nearly universally conserved A (position 1500 in the E. coli numbering system) appears from the limited data available to be commonly modified in other archaea (various methanogens (17,27) and sulfur-dependent thermophiles (25)) but not in bacteria (18,24,40,41,46), yeast (11,47), or Dictyostelium discoideum (48). The extent to which the modified A at this location in archaea is specifically m 6 A (an uncommon modification in both rRNA and tRNA) remains to be determined. Two m 2 6 A nucleotides could have been assigned to any of five (A 2 ,C,U 2 )Gp sequences present in H. volcanii SSU rRNA. Their assignment as tandem m 2 6 As at positions 1518 -1519 at the interface of the ribosomal subunits (49) is expected in view of the very highly conserved nature (50) and functional importance of this tandem pair of modifications (51).
With the exception of acp 3 U-966 (which is otherwise a bacterial tRNA modification), the modification structures and sites found in H. volcanii 16 S rRNA are decidedly more eukaryotic than bacterial in nature ( Fig. 5 and Table I). A similar conclusion was reached earlier concerning base and sugar modification motifs found in archaeal tRNA (4). However, this characteristic does not extend to modification levels; the H. volcanii 16 S rRNA is the least modified cytoplasmic SSU rRNA of which we are aware. For example, in eukaryotes the numbers range from about 20 -80 modifications per rRNA (yeast, ϳ22 and X. laevis, ϳ44 (11); R. norvegicus, ϳ77 (16)), and there are 11 in E. coli (24). It is possible that lower modification levels are a phenotypic characteristic of Euryarchaeota (52) from the archaeal domain. RNase T 1 catalogs (generally Ն5-mers) derived from SSU rRNAs of mostly mesophilic methanogens suggest levels of ten or fewer modified sites (17). Catalogs from hyperthermophiles from the Crenarchaeota show about twice those levels (25), whereas a more accurate estimate from a total nucleoside digest of S. solfataricus 16 S rRNA of ϳ38 residues was reported (39). The high modification levels in S. solfataricus P2 (optimal growth temperature, 75-80°C) were attributed in part to a role in structural stabilization, particularly from the prevalence of 2Ј-O-methylated nucleotides (39). However, earlier RNase catalog data for M. jannaschii (optimal growth temperature, ϳ85°C (27)) from the Euryarchaeota domain showed only seven modifications (27), reflecting a total of perhaps 10 -12 in the molecule, still comparatively low. Taken together with the modification levels of H. volcanii SSU rRNA (Ref. 19 and the present study), the intermediate level of modification exhibited in M. jannaschii (27) may reflect a combination of influences in the latter organism from phenotypically lower modification levels in the Crenarchaeota, coupled with the functional utility of modifications in the stabilization of rRNA in thermophiles.