Mammalian Mitochondrial Ribosomal Proteins (2)

Four different classes of mammalian mitochondrial ribosomal proteins were identified and characterized. Mature proteins were purified from bovine liver and subjected to N-terminal or matrix-assisted laser-desorption mass spectroscopic amino acid sequencing after tryptic in-gel digestion and high pressure liquid chromatography separation of the resulting peptides. Peptide sequences obtained were used to virtually screen expressed sequence tag data bases from human, mouse, and rat. Consensus cDNAs were assembledin silico from various expressed sequence tag sequences identified. Deduced mammalian protein sequences were characterized and compared with ribosomal protein sequences of Escherichia coli and yeast mitochondria. Significant sequence similarities to ribosomal proteins of other sources were detected for three out of four different mammalian protein classes determined. However, the sequence conservation between mitochondrial ribosomal proteins of mammalian and yeast origin is much less than the sequence conservation between cytoplasmic ribosomal proteins of the same species. In particular, this is shown for the mammalian counterparts of the E. coliEcoL2 ribosomal protein (MRP-L14), that do not conserve the specific and functional highly important His229 residue of E. coli and the corresponding yeast mitochondrial Rml2p.

Mitochondrial ribosomal proteins (MRPs) 1 are the organellar counterparts of the cytoplasmic ribosomal proteins of the same species. MRPs are thought to be involved in the maintenance of the mitochondrial DNA, since mitochondria defective in protein biosynthesis lose their DNA successively (1). The protein con-tent of mitochondrial ribosomes is higher than that of cytoplasmic ribosomes. It was estimated that mitochondrial ribosomes contain up to 100 different proteins (2,3). Only a few MRPs have been characterized on the molecular level in the past (4 -7). Recently, mammalian MRP sequences became available by direct N-terminal sequencing of purified proteins and screening of data bases for corresponding DNA sequences (8). 2 The direct approach applied by these authors is a versatile supplement to the laborious ways of MRP identification by mutational cloning or screening for similar proteins. Because MRPs are very heterologous among distantly related species, it is difficult to identify corresponding mammalian genes by screening with probes derived from yeast MRPs, for example. The direct characterization of MRPs from purified mitochondrial ribosomes offers the possibility to identify proteins that otherwise would remain unknown due to the lack of suitable reference proteins. Several "additional" MRPs have been identified in yeast and mammals (3,8) 2,3 that have no counterparts in bacterial or eukaryotic cytoplasmic ribosomes. Thus, it is our intention to characterize mammalian MRPs based on bovine MRPs as model proteins. The latter have been characterized by various biochemical and physical methods (2, 10 -13). In providing molecular information on MRPs, we begin to understand the role of mammalian MRPs and their function in the maintenance of healthy mitochondria. In this paper, four different classes of mammalian MRPs are characterized and compared with their bacterial and yeast mitochondrial counterparts.

EXPERIMENTAL PROCEDURES
Analysis of Bovine MRPs-Isolation and purification of bovine MRPs has been described. 2 After two-dimensional PAGE and Coomassie Blue staining, individual spots of MRPs were cut from the gel. Individual proteins were termed in accordance to the established two-dimensional map of bovine MRPs (2). Proteins were subjected to in-gel digestion by trypsin according to the method of Otto et al. (14). Resulting peptides were isolated and concentrated, purified by reverse phase HPLC, and subjected to Edman sequencing (14) or tandem mass spectroscopic analysis with a Q-Tof (Micromass, Manchester, UK) equipped with a nanoflow Z-spray ion source. It should be noted that the latter method is not able to differentiate between leucine and isoleucine because of the identical MM of these two amino acids.
Computing-Virtual screening of public EST data bases was performed using the blast program of Altschul et al. (15) and the NCBI server. For screening of short peptides the advanced blast program was performed using the modified options "expected": 1000 and "other options": Ϫe2. The analysis of the obtained sequences, the sequence com-* This work was supported by United States Public Health Service Grant GM15438 (to T. W. O.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
** To whom correspondence should be addressed: Inst. for Biology-Genetics, AG Kress, Free University of Berlin, Arnimallee 7, D-14195 Berlin, Germany. Tel.: 49-30-838-2629; Fax: 49-30-838-3649; E-mail: graack@zedat.fu-berlin.de. 1 The abbreviations used are: MRP, mitochondrial ribosomal protein; bp, base pair(s); MISP, mitochondrial import signal peptide; MM, molecular mass; ORF, open reading frame; PAGE, polyacrylamide gel electrophoresis; HPLC, high pressure liquid chromatography; EST, expressed sequence tag. parison, and the assembly of virtual consensus cDNAs were performed as described (8) using several analytic programs of the GCG DNA analysis computer software package (16). Multiple alignments of sequences obtained were made using the "bestfit" program and by subsequent manual arrangements for optimal alignments to avoid domination of groups of, for example, the mammalian sequences. Blast searches for orthologous Caenorhabditis elegans protein sequences were performed on the Sanger Center's server. Calculations of N-terminal protein sequences for signal peptide properties were done with the Center for Biological Sequence Analysis server (17).

RESULTS
Purification and Characterization of Individual MRPs-Ribosomal proteins of the bovine mitochondrial large ribosomal subunit have been purified previously. 2 We determined the MM of several individual MRPs by SDS-gel electrophoresis (Table I; see also Ref. 2). In parallel, proteins of the large mitoribosomal subunit were separated by two-dimensional PAGE and individual MRPs subjected to in gel tryptic digestion. Peptides derived from individual proteins were separated by RP-HPLC and analyzed either by Edman N-terminal amino acid sequencing or by matrix-assisted laser-desorption mass spectroscopy. Several short peptide sequences were obtained (Table I), although some of them only as mixtures from two different peptides (see below). Peptides that terminate either with an arginine or lysine residue are considered to be sequenced completely, since these peptides result from a proteolytic cut C-terminal of their Lys or Arg residue.
Virtual Screening of Data Banks and Assembly of Consensus cDNA Sequences for Mammalian MRPs-The longest amino acid sequences (Table I) obtained for each of the analyzed MRPs were used to screen human, mouse, and other EST data banks. In all cases, several EST sequences were identified, and consensus cDNAs were assembled by repetitive screens and comparison of EST sequences of one species or among species to eliminate cDNA sequencing errors and to obtain continuous and conserved ORFs of human, mouse, and rat corresponding to the different bovine MRPs analyzed.
MRP-L5-Twelve and five "primary hits" were found for human and rat MRP-L5, respectively, by a preliminary screen of the EST data banks using MRP-L5 bov peptide 1 (Table I) as in silico screening probe. The MRP-L5 human consensus cDNA was assembled from four different ESTs, yielding a sequence of 1072 nucleotides encoding a complete (start codon to stop codon) ORF of 1017 bp (Table II). Additionally, two corresponding human genomic DNA clones (GenBank TM accession nos. AP000086 and AP000223) were identified. Sequence information from the genomic clone (AP000086) was used to confirm that in the 5Ј direction no further translational start codon is localized. Sequences from a putative TATA box in the genomic DNA downstream to the start codon of the most 5Ј EST were added to complete the consensus cDNA deduced from the ESTs (Table II). Two consecutive polyadenylation signals, AATAAA, were identified 3Ј from the stop codon. A continuous poly(A) stretch starts at bp 1451. Thus, the polyadenylation sites are considered to be the true signal sequences for this modification of the MRP-L5 human mRNA. The ORF of MRP-L5 human (Table  II) encodes a protein of 338 amino acid residues (Fig. 1a). The MRP-L5 bovine peptides 1 and 2 (Table I) correspond to a sequence located in the center of the deduced MRP-L5 human sequence (Fig. 1a). The sequence is strongly conserved between bovine and human at these positions, with 14 out of 15 amino acid residues identical (Fig. 1a). For MRP-L5 human a putative cleavable mitochondrial import signal peptide (MISP) of 21 amino acid residues was calculated by the SignalP program (17). The mature MRP-L5 human after cleavage of the MISP has a calculated MM of 36,583 Da. This value is in good correspondence to the MM of the mature MRP-L5 bovine (i.e. after cleavage of a putative MISP) of 35.5 kDa (Table I), based on the proposal that mammalian MRPs are quite similar in their properties like MM, pI, and amino acid sequence (8). 2 For the processed form of MRP-L5 human , a pI of 7.36 was calculated. This value coincides with the characterization of MRP-L5 bovine and MRP-L5 rat as relatively less basic MRPs by two-dimensional PAGE (2) and comparative two-dimensional PAGE, respectively (18).
Comparison of the genomic clone and the assembled cDNA revealed the exon/intron structure of the MRP-L5 human gene (Fig. 2). The coding region of MRP-L5 human is spread out into 10 small exons separated by nine introns up to 3.5 kilobases in size. Exon A (bp 1-74 of the consensus cDNA; Table II)   obtained by amino acid sequencing X represents amino acids that could not be identified, although the position in the respective peptide sequence in correspondence to the tryptic cleavage site is valid. MS sequence has been obtained by mass spectroscopy: leucine and isoleucine cannot be differentiated by this method. For amino acid residues in lowercase type, determination is uncertain. For MRP-L7 bov and MRP-L14 bov , respective sequences 2 and 3, amino acid sequences were obtained as a mixture, and individual sequences of peptides 2 and 3 were identified by comparison with the deduced amino acid sequence of MRP-L7 human and MRP-L14 mouse , respectively.  For MRP-L5 mouse , no hits were found in the primary screen of mouse EST sequences using MRP-L5 bovine peptide 1 as screening probe. Instead, the complete MRP-L5 human protein sequence (Fig. 1a) was used to screen mouse ESTs for corresponding entries. A consensus cDNA of 1072 bp was assembled from three different ESTs (Table II) encoding a continuous ORF of 1010 bp. However, comparison of the amino acid sequence of MRP-L5 mouse (Fig. 1a) deduced from the completely assembled consensus cDNA shows that only seven out of 11 amino acid residues are identical between MRP-L5s of mouse and bovine in this position. This value is below the threshold of the blast program used (15), explaining the inability to detect mouse ESTs by using the bovine sequences. Similar to the human cDNA, two consecutive polyadenylation signals, AT-TAAA and AATAAA, were found 3Ј of the stop codon. Only the last 3 nucleotides of the consensus cDNA (bp 1070 -1072) may represent the 5Ј-end of the continuous poly(A) stretch, which can be deduced from the proximity of the two polyadenylation signals mentioned. No putative splice sites were determined in the center and the 3Ј-end of this consensus cDNA by sequence comparison of various EST sequences. A complete ORF encoding for 336 amino acid residues was determined (Fig. 1a). The deduced protein has a calculated MM of 38,548 Da and a TABLE II Assembly of identified EST sequences: consensus cDNAs in 5Ј to 3Ј direction Determination of consensus cDNA sequences for deduction of mammalian MRP ORFs by assembly of EST sequences is shown. None of the EST sequences listed contains incompletely spliced products. Asterisks label incomplete ORFs. r,c, the nucleotide sequence was determined in reverse complement orientation. ϩ, the assigned ORF ignores three frameshifts in accordance with the strong amino acid sequence conservation between the human 1, human 2, and human 3 sequences and the comparison with the CGI-22 sequence identified from the data bases. R-protein family, affiliation of deduced ORFs to existing families of similar ribosomal proteins.  (17) no cleavage site for a MISP was proposed, since both proteins at their respective N termini are lacking a strong sequence similarity (Fig. 1a). This may result from the occurrence of a splice variant of the MRP-L5 mouse consensus cDNA found in the 5Ј direction from the sequence encoding 3 consecutive glycine residues (Fig. 1a). No mouse splice variant showing a high degree of sequence similarity to the MRP-L5 human was found. Another possibility is that the respective MISPs of MRP-L5s from human and mouse have been less strictly conserved as compared with the respective functional, mature proteins during evolution.
For MRP-L5 rat only ESTs corresponding to the C terminus were found. From the longest EST (Table II) a peptide of 174 amino acid residues was deduced that is in very good sequence correspondence to the mouse and human MRP-L5s (Fig. 1a). Further, the rat and mouse MRP-L5 cDNAs are quite similar in their respective 3Ј-untranslated trailer sequences. Like the mouse cDNA, the rat cDNA contains two consecutive polyadenylation signals, ATTAAA (bp 545-560) and AATAAA (bp 560 -565), and a putative poly(A) track beginning at bp 577.
In searching the data bases for similar proteins, no corresponding sequences from eubacterial or mitochondrial sources were found. Mammalian MRP-L5s represent a new class of ribosomal proteins not found in ribosomes of bacterial or eukaryotic cytoplasmic origin. However, screening the C. elegans genomic data base, a contig (Y62F5.ctg01193) was identified that encodes a stretch of 238 amino acid residues showing 26% sequence identity to the MRP-L5 human . It remains speculative if this relatively low sequence similarity is significant.
MRP-L7-Fourteen, 15, and three primary hits of human, mouse, and rat origin, respectively, were found using bovine MRP-L7 peptide 1 (Table I) as virtual screening probe. For MRP-L7 human , a consensus cDNA of 1142 bp was assembled from five different EST sequences (Table II). Splice sites were localized at nucleotide positions 37/38, 52/53, and 392/393 by comparison of several completely and incompletely spliced EST sequences. However, to determine the true mature 5Ј splice form, we performed a 5Ј-rapid amplification of cDNA ends cloning of the corresponding cDNA from a fetal cDNA library and sequenced the obtained cDNA at its 5Ј terminus (data not shown). The obtained sequence perfectly matched one of the EST-derived 5Ј splice variants to provide the mature 5Ј terminus of the native MRP-L7 cDNA. A polyadenylation signal, ATTAAA, was found at bp 1095-1100, and the poly(A) track begins at bp 1125. A continuous ORF of 296 amino acid residues was determined. The start codon at bp 57-59 is preceded  (Table I) and with corresponding ribosomal protein sequences of yeast MRPs and E. coli. Numbers give the respective amino acid position. Vertical lines mark identical amino acid positions; colons mark strongly conserved amino acid residues, and periods (.) mark weakly conserved amino acid residues. Dashes show N-or C-terminal ends of incomplete amino acid sequences, and asterisks mark stop codons. Amino acid residues in lowercase letters are uncertain in their determination by amino acid sequencing (see Table  I). x, unidentified amino acids, although the positional number is valid in respect to the tryptic digest that created the respective peptides. by an in-frame stop codon at bp 33-35. The relative positions of the three different MRP-L7 bov peptides (Table I) were determined, although peptides 2 and 3 were obtained as a mixed sequence by Edman sequencing that could be resolved only by comparison with the deduced MRP-L7 human protein sequence (Fig. 1b). The sequences of the MRP-L7 bov peptides are in very good correspondence to the deduced MRP-L7 human sequence (Fig. 1b). We postulate an MISP of 21 amino acid residues by comparison of the peptide 1 sequence with the deduced MRP-L7 human ORF. However, this MISP of the MRP-L7 human remains speculative, since the N terminus of the corresponding bovine peptide 1 seems to be a product of an experimental tryptic endoproteolytic digest rather than of an MISP cleavage during import into the mitochondrion (Fig. 1b). The SignalP program (17) (2). The protein appears as a slow moving, streaking/tailing protein in the first dimension, suggesting that it is FIG.1-continued significantly more acidic than the average MRPs (2). The strange migration properties result from the tight binding of RNA pieces of variable sizes when extracted by the urea/LiCl method (2). If bovine MRPs are extracted following an RNase treatment of the ribosomal subunits, MRP-L7 bov migrates along with MRPs of higher pI. 4 For MRP-L7 mouse a consensus cDNA of 1047 bp was assembled. Putative splice sites were determined at bp 31/32, 245/ 246, 464/465, and 977/978. A polyadenylation signal, ATTAAA, was identified at bp 983-988, and a consecutive poly(A) stretch begins at bp 1006. The deduced MRP-L7 mouse ORF is incomplete at its N terminus. There was no mouse EST identified corresponding to the N terminus of the MRP-L7 human . The MRP-L7 mouse ORF encodes 306 amino acid residues and shows high sequence similarity compared with the MRP-L7 human protein (91.5% identity over 284 amino acid residues; see Fig. 1b). The deduced protein has a calculated MM of 34,995 Da and a pI of 11.01. Like MRP-L7 human , MRP-L7 mouse is a highly basic protein. For MRP-L7 rat , two different ESTs were identified that encode nonoverlapping fragments (Table II, Fig. 1b). The ORFs deduced from these ESTs show very high sequence similarities compared with the respective mouse and human MRP-L7 sequences (Fig. 1b). Thus, these ESTs can be anticipated to represent the corresponding rat MRP-L7 cDNA. No signal peptide cleavage sites were predicted by the SignalP program (17) within the 50 most N-terminal amino acid residues of the respective mouse and rat MRP-L7 sequences.
Sequence comparison with ribosomal proteins from other sources revealed that the mammalian MRP-L7s correspond to the yeast MRP YmL10 and the E. coli EcoL15 ribosomal protein (Fig. 1b). However, the conservation is not very high (e.g. 38.3% sequence identity and 48.2% similarity between MRP-L7 human and YmL10 over a stretch of 126 amino acid residues). Additionally, the mammalian MRP-L7 and, very unusually, the EcoL15 sequences contain insertions compared with the yeast YmL10 (Fig. 1b). In the C. elegans genomic data base, a contig (Y92H12.Contig307) containing corresponding DNA sequences was identified. Five different DNA stretches encoding 24 -58 amino acid residues, respectively, show sequence identities between 29 and 41% to the mammalian MRP-L7 sequence. Putatively, the scattering of the short peptide-encoding DNA sequences reflects the exon/intron structure of the C. elegans gene ortholog to mammalian MRP-L7.
MRP-L14 -From MRP-L14 bov , several different peptides were obtained after in-gel digestion of mature MRP-L14 bov by trypsin. Seven different peptide sequences were obtained by N-terminal Edman sequencing (Table I), although peptides 2 and 3 were provided as a mixed sequence that could be resolved only by the comparison to the deduced MRP-L14 human sequence (see below). The sequence information of peptide 1 was used for a primary screening of human, mouse, and rat EST data banks. Two, 14, and three primary hits were found, respectively. For MRP-L14 mouse , a consensus cDNA of 1012 nucleotides was assembled (Table II). Furthermore, intron splice sites were localized at bp 139/140, 251/252, and 314/315, respectively, by comparison of incompletely spliced individual EST sequences with the assembled consensus cDNA sequence. A continuous ORF of 920 bp was determined (Table II). Between bp 985 and 990 a perfect polyadenylation signal, AATAAA, was identified. A continuous poly(A) track is found from bp 1006 up to the end of the consensus cDNA sequence. The nucleotide neighborhood of the first ATG codon fits the consensus sequence of eukaryotic start codons (19). Thus, we propose this ATG codon to be the true translation initiator of the MRP-L14 mouse sequence, although no 5Ј in frame stop codon was found. The deduced ORF encodes a protein of 306 amino acid residues (Fig. 1c). Compared with the MM of the mature MRP-L14 bov of 30 kDa (Table  I), an MISP of approximately 30 amino acids may be proposed for the MRP-L14 mouse . By the SignalP program (17), a cleavage site between amino acid residues 18 (Ala) and 19 (Ala) is proposed. After cleavage of this proposed MISP, the mature MRP-L14 mouse has a calculated MM of 31,510 Da, which fits the MM of MRP-L14 bov deduced by SDS-PAGE (Table I). The calculated pI is 11.66. This is consistent with the finding of Pietromonaco et al. (18) that bovine and other mammalian MRPs have similar pIs. MRP-L14 bov has been characterized as the most basic protein in its class (18).
A single EST of rat origin was identified (Table II). From this EST sequence a 5Ј truncated ORF of 177 amino acid residues was deduced, which shows strong sequence similarity to all except one of the bovine peptide sequences (Fig. 1c).
Further, several EST sequences were identified for MRP-L14 human . They separate into three different classes by their overlapping sequence identities (Table II). Three differentially spliced versions of the MRP-L14 human cDNA were assembled (Figs. 1c and 3). Only cDNA 1 encodes for a complete MRP-L14 human ORF (Fig. 1c). Since the C-terminal part of the consensus cDNA 1 is deduced from a single EST sequence only (Table II), three frameshifts in the deduced amino acid sequence (lowercase and underlined letters in the human1 sequence, Fig. 1c) have to be ignored to obtain a highly conserved peptide sequence compared with the two other splicing variants of MRP-L14 human and MRP-L14 mouse . Additionally, a published human cDNA CGI-22 (accession no. AF132956) was found in the data bases. This cDNA was cloned by comparative gene cloning with the C. elegans proteome as template, but not further characterized. Also, it is not specified in the submission how accurate the sequencing was, e.g. from both strands or single reading only. However, it fits to the MRP-L14 human 1 4 T. W. O'Brien, unpublished observations. consensus cDNA overriding the primary two frameshifts in the MRP-L14 human 1 consensus cDNA mentioned (Fig. 1c). It does follow the third mentioned frameshift, producing an elongated ORF that is, from the frameshift on, no longer similar to all other MRP-L14 mammalian or orthologous sequences from other sources (Fig. 1c). Comparing all mammalian MRP-L14s at these respective positions (Fig. 1c, C-terminal gap as compared with EcoL2), we conclude that the frameshifts in MRP-L14 human 1 and CGI-22 are caused by sequencing errors due to the following reasons. (i) The frameshifts occur in a region that is rich in guanosine bases showing a high tendency for sequencing errors. (ii) If MRP-L14 human 1 (frameshifted representation as compared with the consensus cDNA) and CGI-22 (no frameshift) are compared downstream from this particular position they show no sequence similarity anymore. (iii) However, the annotation of the MRP-L14s to the EcoL2 family of ribosomal proteins depends also on the conserved histidine respective glutamine residue (boldface type in Fig. 1c) that is not shown in the CGI-22 sequence. (iv) The CGI-22 protein sequence is elongated C-terminally as compared with the MRP-L14s. The (putatively) translated region covers many repetitive nucleotide stretches, e.g. a T-rich sequence (29 out of 32 nucleotides) translated as stretches of cysteines and phenylalanines (Fig.  1c). Similar stretches are typical for untranslated rather than for translated sequences. cDNAs 2 and 3 are incomplete in their respective 5Ј-ends, however, since no ESTs extending further in the 5Ј direction were found (Figs. 1c and 3). Peptide sequences I, C, and II are shared by all three different cDNAs (Fig. 3). Peptide A of cDNA 1 is replaced in cDNA 2 and 3 by peptide B, which shares only 33% identical amino acid residues with peptide A. Peptide D of cDNAs 1 and 2 is replaced in cDNA 3 by peptide E. Additionally, cDNA 3 contains peptide F, which has no equivalent in cDNAs 1 and 2 (Fig. 3). However, if the human MRP-L14 cDNAs are compared with the corresponding mouse sequence (Fig. 1c), a fourth splicing version consisting of peptides I, A (as in cDNA 1), C, E, and F (as in cDNA 3), and II is found (Fig. 3). The MRP-L14 rat sequence follows the mouse version (Fig. 1c). The mature MRP-L14 bov contains the counterpart to the human peptide F, as far as it can be concluded by determination of the corresponding peptide (Table I; Fig. 1c).
By screening the data banks for proteins similar to the mammalian MRP-L14s, several bacterial and chloroplast ribosomal proteins of the L2 family were found. Correspondingly, MRP sequences of Reclinomonas americana, Marchantia polymorpha, Oryza sativa, Acanthamoeba castallanii, and Paramecium tetraurelia were identified. However, these corresponding MRPs are encoded by the respective mitochondrial DNAs. In the C. elegans genome data bases, a single contig (R08B6.ctg00286) was identified that contains several stretches of DNA encoding between 11 and 73 amino acid residues that are similar to the MRP-L14 mammalian sequences. From yeast, the Rml2p protein was identified, which has recently been proposed to be the mitochondrial counterpart of the bacterial L2 ribosomal proteins (21). The amino acid sequences of the yeast Rml2p and of the EcoL2 ribosomal protein were aligned to the mammalian MRP-L14s (Fig. 1c). Interestingly, the yeast Rml2p is longer than the respective mammalian counterparts. Several insertions and a C-terminal extension show that the functional similarities of the yeast and mammalian EcoL2 counterparts may be not as high as the conservation among eubacterial or eukaryotic cytoplasmic ribosomal proteins of the L2 family suggests (21). MRP-L14 mouse and Rml2p show a sequence identity of 31.7% over 207 amino acid residues only. This is pinpointed by the result that a specific histidine residue (boldface H in Fig. 1c) is not conserved among mam-malian MRPs that is highly conserved in human cytoplasmic, yeast mitochondrial, and bacterial L2 class proteins (21,22). In mammalian MRPs, this His is replaced by a glutamine in rat and mouse (Fig. 1c). Also, the insertions of 6 and 13 amino acid residues N-terminally of the conserved His residue in yeast Rml2p and EcoL2, respectively, are not conserved in mammalian MRP-L14s (Fig. 1c).
MRP-L26 -MRP-L26 bov was not digested by trypsin prior to sequencing, but the mature protein was subjected to N-terminal peptide sequencing. A sequence of 20 amino acid residues was obtained (Table I). Several ESTs were identified of human, mouse, and rat origin, respectively, using this peptide information as virtual screening probe. For MRP-L26 human , a complete consensus cDNA of 754 nucleotides was assembled (Table II). An ORF of 527 nucleotides was identified encoding a basic protein of 176 amino acid residues (Fig. 1d). No further 5Ј extending ESTs were identified. To determine whether the EST-derived sequence represents the true mature form of the MRP-L26 human cDNA, we performed a 5Ј-rapid amplification of cDNA ends cloning the corresponding cDNA from a fetal cDNA library and sequenced the obtained cDNA at its 5Ј terminus (data not shown). The obtained sequence perfectly matched the 5Ј-end of the consensus cDNA derived from assembled ESTs; however, no additional sequence information extending the EST sequences in the 5Ј direction was obtained for MRP-L26 human cDNA. Therefore, we postulate the primary ATG codon at bp 21-23 to be the natural start codon for translation, since it fits the consensus site for eukaryotic translational start (19). Compared with the N terminus of the mature MRP-L26 bov determined by amino acid sequencing, we postulated the first 8 amino acids of the deduced MRP-L26 human sequence to serve as a MISP. This MISP does not belong to one of the specified classes of MISPs according to Ref. 20, R-2, R-3, or R-10, respectively. The postulated mature MRP-L26 human has a calculated MM of 19,391 Da and a pI of 10.8. The former value is in very good correspondence to the MM of the corresponding mature MRP-L26 bov of 18.8 kDa, which has been determined by SDS-PAGE (Table I). The calculated pI is consistent with the classification of MRP-L26 as the most basic protein of its size class in accordance with the result that MRPs of different mammals are very similar in their pI values (18). In the 3Ј-end of the assembled consensus cDNA, two adjacent polyadenylation signals, AATAGATAA, are found in positions 707-717 (overlapping nucleotides are underlined). From nucleotide 734 up to the 3Ј-end, a stretch of 21 consecutive Ala residues is found, which is considered to represent the mature poly(A) tail of the MRP-L26 human cDNA.
For MRP-L26 mouse , a consensus cDNA of 1124 nucleotides was assembled, encoding a protein of 177 amino acids (Table  II). By comparison with incompletely spliced EST sequences, an intron was localized at the bp 580/581 boundary. The cDNA shows no polyadenylation site nor a 3Ј poly(A) track. Thus, compared with the shorter MRP-L26 human cDNA, we suspect putative intron sequences to be present in the 3Ј-end of the presented sequence. MRP-L26 of human and mouse are quite similar in sequence, although the MRP-L26 mouse contains a single amino acid insertion compared with the MRP-L26 human (Fig. 1d). We postulate a MISP of 8 amino acid residues for the same reasons as for MRP-L26 human . Additionally, the MRP-L26 mouse start codon is preceded by an in-frame TGA stop codon (bp 86 -88), making the postulated translational start highly probable. For none of the determined N termini of mammalian MRP-L26s does the SignalP program (17) predict a cleavage site within the first 50 amino acid residues of the deduced ORFs. This feature characterizes the postulated MISPs to be rather unusual. However, the short sequences contain a repetitive alanine motif, which has been postulated to be a common motif of mammalian MISPs for MRPs recently. 3 The mature MRP-L26 mouse has a calculated MM of 19,435 Da and a pI of 10.53. These values are very similar to those of MRP-L26 human .
For MRP-L26 rat only fragmentary cDNA information was obtained (Table II). The first cDNA fragment shows an ORF of 98 amino acids very similar to the deduced N termini of the mouse and human MRP-L26 proteins (Fig. 1d). Beyond bp position 308 of this cDNA fragment, an intron sequence begins such that the continuing ORF no longer fits any more to the respective mouse and human sequences (see also Fig. 1d). A second cDNA fragment encodes the C-terminal 53 amino acid residues of the MRP-L26 rat (Table II, Fig. 1d). No polyadenylation signal nor poly(A) track are found in this EST fragment. However, the encoded peptide sequences of the fragmentary MRP-L26 rat cDNA sequences are very similar to the respective human and mouse sequences (Fig. 1d). Thus, the mammalian MRP-L26s can be characterized by these consensus cDNA sequences.
The MRP-L26 bov protein was identified as a putative early assembly protein by virtue of its topography, essentially buried in the subribosomal particle (10), and its strong RNA binding properties (11). These data are consistent with the identification of MRP-L26 as the counterpart of the E. coli EcoL17 ribosomal protein. EcoL17 has been described as an early assembly protein of the large subunit that is not involved in the catalytic function of the ribosome (23). Additionally, the yeast MRP YmL8 has been identified to be the corresponding yeast protein (24). The alignment of the MRPs with the E. coli EcoL17 is shown in Fig. 1d. YmL8 is an essential protein of the yeast mitoribosome. Gene disruption of the MRP-L8 gene not only renders the mutant unable to grow on nonfermentable carbon sources but also hampers the growth on fermentable carbon sources (24). However, the overall similarities between EcoL17, YmL8, and the mammalian MRP-L26s are poor (Fig. 1d). MRP-L26 human and YmL8 share only 31.5% identical amino acid residues over a length of 121 amino acids. YmL8 contains an insert of several amino acid residues and a C-terminal elongation compared with the other ribosomal proteins of this family. These relatively weak similarities between yeast and mammalian MRPs coincide with the generally poor sequence conservation observed between MRPs of mammalian and yeast origin (8). 2 Similarly, in the C. elegans genomic DNA (Y54E10.Contig159), five separate DNA stretches are found encoding peptides between 24 and 58 amino acid residues in length that show sequence identities with the mammalian MRP-L26s between 29 and 41%. We suppose the scattering of these corresponding C. elegans DNA sequences to represent the exon/intron structure of the orthologous MRP-L26 C. elegans gene.
Altogether, four different classes of mammalian MRPs have been characterized on the molecular level. Three of them show significant sequence similarities to eubacterial and yeast mitochondrial ribosomal proteins. However, the conservation between yeast and mammalian MRPs is significantly lower than between cytoplasmic ribosomal proteins of the same species.

Characterization and Evolution of Mammalian MRPs-
Mammalian MRPs have been characterized on the molecular level only by chance in the past (4 -7). Recently, by combining protein purification and peptide sequencing efforts, genome project results and computational analysis, several mammalian MRP cDNA and protein sequences have become available (8). 2,3 In general, mammalian MRPs of different origin are quite similar to each other in properties like amino acid sequence, MM, and pI. MM of corresponding MRPs calculated from deduced amino acid sequences are similar to those determined by SDS-PAGE (2). However, corresponding MRPs of mammals and yeast are much less conserved than cytoplasmic ribosomal proteins of the same species. Cytoplasmic ribosomal proteins of yeast and rat show sequence identities ranging between 40 and 80% (25). Yeast and mammalian MRPs are much less conserved, between 31 and 38% identity (this work). This corresponds with the values obtained for other MRPs conserved among yeast and mammals (8). 2 Additionally, several mammalian MRPs have been identified that have no counterpart in the cytoplasmic ribosome (8). 2,3 This result suggests that the evolutionary pressure on both the cytoplasmic and the mitochondrial protein biosynthetic apparatus must have been different during the adaptation of the premitochondrial endosymbiont to life within the hosting cell (24).
Structure and Function of MRPs-Three out of four mammalian MRPs described in this paper show significant sequence similarities to yeast mitochondrial and bacterial ribosomal proteins. Thus, similar functions of these MRPs, compared with the bacterial ribosomal proteins, can be inferred. MRP-L7 bov has been characterized as a strong RNA binding protein that is tightly bound to the ribosomal core particle under 4 M LiCl washing conditions (11). The protein is deeply buried in the 39 S subunit, and it is assumed to be involved in early assembly processes (11). EcoL15, the bacterial counterpart of the mammalian MRP-L7s, is a main assembly protein of the large ribosomal subunit, but it is not a true early assembly protein (18). However, the assembly function of MRPs may be different from that of their bacterial counterparts due to the higher protein content of mammalian mitochondrial ribosomes as compared with E. coli ribosomes (2). This might be true for the MRP-L26 protein too. MRP-L5 bovine was identified as a buried protein of mammalian mitochondrial ribosomes that can be stripped from the large ribosomal subunit only by incubation with 4 M LiCl (26). However, it is present in the large mitochondrial subunit in stoichiometric amounts (2). Since this protein lacks a bacterial or yeast mitochondrial counterpart, no further functional description can be given at this time.
MISPs of MRPs-Only in the minority of MRPs analyzed in this paper, a doubtless identification of MISPs was performed. Only in the case of MRP-L26 bov , a mature MRP was N-terminal sequenced, revealing the mature N terminus of the processed protein. However, the MISPs postulated by comparison of the mature N terminus of MRP-L26 bov with the deduced ORFs of human, mouse, and rat are unusually short (8 amino acid residues) and were not recognized by the SignalP prediction program (17). However, for other mammalian MRPs MISPs of comparable lengths have been determined. 5 In yeast MISPs for MRPs of similar length have been detected repetitively (3). Therefore, we conclude these postulated MISPs to be the true import signals of the mammalian MRP-L26s.
In the cases of the three other groups of MRPs, only the computer predictions for MISPs can be evaluated. For MRP-L14 human 1 and CGI-22, cleavage sites between amino acid residues 22 (Ala) and 23 (Pro) and amino acids 21 (Ala) and 22 (Ala) were predicted, respectively. For MRP-L5 human a MISP is predicted by the computer but not for the corresponding MRP-L5 mouse . No predictions were made for the mammalian MRP-L7s. In the latter case, the existence of a peptide sequence derived from the mature MRP-L7 bov close to the N termini of the mammalian MRP-L7s implies that putative MISPs do not extend further in the C-terminal direction than the N-terminal Arg residue of this peptide (Fig. 1b). However, the import properties of the MRP-L7s still to be elucidated seemed to be rather unusual. Nevertheless, unusual MISPs have been also determined for other mammalian MRPs. 5 It is astonishing to see that mammalian MRPs, whose yeast counterparts do not possess an N-terminal cleavable MISP, always have adopted additional N-terminal cleavable MISP sequences (YmL8/MRP-L26 mammalian (this work). YmL33/ MRP-L28 mammalian (8), YmL38/MRP-L32 mammalian (8)). In the cases of the yeast MRPs without cleavable N-terminal MISPs, mitochondrial import signals can be inferred to reside in the interior of the protein sequence. YmL8, the yeast mitochondrial counterpart of mammalian MRP-L26, is transported into the mitochondria without cleavage of an N-terminal MISP (27). The signal sequence for mitochondrial import is localized internally in the middle of the protein. Interestingly, the N terminus of YmL8 (unprocessed) corresponds very well to the N termini of the mature mammalian MRP-L26s (Fig. 1d). The same phenomenon is observed in the case of YmL38/MRP-L32 (8), too. These findings suggest that mitochondrial import signaling by internal peptide sequences is either (i) restricted to yeast MRPs and/or (ii) the mammalian mitochondrial protein import mechanisms are strongly different from that of yeast or (iii) that mammalian MRPs have lost the ability to be imported by internal peptide signaling thus, which had to be compensated by the adoption of additional N-terminal MISPs. It should be noted at this point that mammalian MRPs seem to possess a mitochondrial import signaling pathway that is more strongly conserved than that of yeast MRPs that are imported by different import mechanisms. One hint is the conservation of a small alanine-rich sequence motif in MISPs of mammalian MRPs. 5 Splice Variants and Functional Significance of MRP-L14 -Splice variants and gene families for cytoplasmic ribosomal proteins are found in yeast as well as in mammals. MRP genes were considered to be present only once in the haploid genome in contrast to cytoplasmic ribosomal protein genes. Splicing itself occurs very rarely on yeast MRP pre-mRNAs, and no splice variants have been reported from yeast (3). However, this seems not to be true in mammals. For MRP-L28 human (8), two different cDNA sequences were assembled from the EST data banks. One cDNA (MRP-L28 human 2) could be found in fetal liver spleen only, whereas the other version (MRP-L28 human 1) is omnipresent during development and in different tissues (8). For MRP-L14, four different splice variants were determined from mouse and human (this work). The different MRP-L14 human cDNAs could not be assigned to a specific tissue or developmental stage. The exception is that cDNAs 2 and 3 ( Fig. 2) derive from malignant tissues only. However, the occurrence of splice variants sheds light on a possible modification of the important function of mammalian MRP-L14. The bacterial and cytoplasmic counterparts of MRP-L14 are involved in the peptidyltransferase function of the ribosome and can replace each other in functional ribosomes (22). The functional relevance of individual amino acid residues has been shown (21, 22). The yeast mitochondrial counterpart of mam-malian MRP-L14, Rml2p, is essential for mitochondrial function (21). Substitution of Rml2p His 343 to Gln causes a conditional respiratory growth phenotype without detectable effect on the assembly or stability of the large mitoribosomal subunit (21). The corresponding His 229 of EcoL2 has been shown to be involved in the peptidyltransferase activity of the ribosome (9,22). The replacement of EcoL2 His 229 by Gln does not affect the assembly of the ribosomes but abolishes completely the peptidyltransferase activity (9). Similar results have been reported for the substitution of His 229 by glycine, arginine, and alanine (22). The authors claim that His 343 is highly conserved throughout evolution to preserve this function. However, in the corresponding mammalian MRP-L14 proteins the corresponding His residue is not strictly conserved (Fig. 1c). In mouse and rat MRP-L14s, this His is substituted by Gln (Fig. 1c). Additionally, the N-terminal part of the 14-mer peptide sequence around Rml2p His 343 , which has been postulated to be very important for the L2 ribosomal protein function (21), is completely missing in all mammalian MRP-L14 sequences (Fig.  1c). Thus, the functional roles of His or Gln at this respective position as well as the conserved 14-mer amino acid sequence remain questionable.