Mammalian Mitochondrial Ribosomal Proteins N-TERMINAL AMINO ACID SEQUENCING, CHARACTERIZATION, AND IDENTIFICATION OF CORRESPONDING GENE SEQUENCES*

The integrity of healthy mitochondria is supposed to depend largely on proper mitochondrial protein biosynthesis. Mitochondrial ribosomal proteins (MRPs) are di-rectly involved in this process. To identify mammalian mitochondrial ribosomal proteins and their corresponding genes, we purified mature rat MRPs and determined 12 different N-terminal amino acid sequences. Using this peptide information, data banks were screened for corresponding DNA sequences to identify the genes or to establish consensus cDNAs and to characterize the deduced MRP open reading frames. Eight different groups of corresponding mammalian MRPs constituted from human, mouse, and rat origin were identified. Five of them show significant sequence similarities to bacterial and/or yeast mitochondrial ribosomal proteins. How-ever, MRPs are much less conserved in respect to the amino acid sequence among species than cytoplasmic ribosomal proteins of eukaryotes and bacteria. Intact mitochondrial protein been to be for the maintenance of DNA Nearly all of the ribosomal proteins (MRPs) protein synthesis Knock-out yeast MRP respiratory capacity and change to r 2 r o mt genetic status by successive losses of mt DNA (1). MRPs

The integrity of healthy mitochondria is supposed to depend largely on proper mitochondrial protein biosynthesis. Mitochondrial ribosomal proteins (MRPs) are directly involved in this process. To identify mammalian mitochondrial ribosomal proteins and their corresponding genes, we purified mature rat MRPs and determined 12 different N-terminal amino acid sequences. Using this peptide information, data banks were screened for corresponding DNA sequences to identify the genes or to establish consensus cDNAs and to characterize the deduced MRP open reading frames. Eight different groups of corresponding mammalian MRPs constituted from human, mouse, and rat origin were identified. Five of them show significant sequence similarities to bacterial and/or yeast mitochondrial ribosomal proteins. However, MRPs are much less conserved in respect to the amino acid sequence among species than cytoplasmic ribosomal proteins of eukaryotes and bacteria.
Intact mitochondrial protein biosynthesis has been shown to be indispensable for the maintenance of mitochondrial DNA in yeast (1). Nearly all of the mitochondrial ribosomal proteins (MRPs) 1 investigated so far are essential for proper mt protein synthesis (2). Knock-out mutants of yeast MRP genes lose their respiratory capacity and change to Ϫ or o mt genetic status by successive losses of mt DNA (1). In higher eukaryotes, the knowledge about comparable functions of MRPs is only rudimentary, since only a few MRPs have been characterized on the molecular level. The protein composition of mammalian mt ribosomes has been studied extensively (3)(4)(5). Some properties of mt ribosomes such as structure (6), binding of nucleotides and RNA (7)(8)(9)(10)(11), and interaction with different factors have been studied (12)(13)(14). However, only 3 of the approximately 80 -100 different human MRPs have been described at the molecular level so far. MRL3, which is the EcoL3 counterpart in human mt ribosomes, was identified as an overexpressed r-protein in Mahlavu hepatomic cells (15). Later, it was postulated to be a true MRP by virtue of its sequence similarity to the corresponding yeast MRP YmL9 (16). MRPL12 was identified as a delayed-early response gene similar in sequence to the Escherichia coli L7/L12 r-protein (17). The metazoan mitochondrial counterpart of EcoS12 has been characterized in Drosophila, human, and mouse (18,19). In Drosophila a mutation of mt S12 causes abnormal behavior. This is the first case reported so far of affection of the status of an animal by an MRP mutation (18). Diseases affecting mitochondria are known in humans, and are caused by nuclear mutations responsible for the loss of mt DNA as a secondary effect by a so far unknown mechanism (20,21). Mutations of MRPs are good candidates affecting mt genetic and/or physiological status. To characterize mammalian MRPs and to compare their biochemical properties with that of their (essential) counterparts, e.g. of yeast, we identified several mammalian MRPs and their corresponding gene sequences. We used N-terminal sequence information obtained from purified rat MRPs to screen DNA data banks and to characterize identified MRPs of rat, human, and mouse.

EXPERIMENTAL PROCEDURES
Determinaton of Rat N-terminal MRP Peptide Sequences-Preparation of mitochondrial ribosomes from rat liver was accomplished according to Ref. 3. Proteins were extracted with acetic acid and lyophilized (22), and were separated by two-dimensional PAGE as described (23). Proteins from the second dimension gels were transferred to polyvinylidene difluoride membranes by Western blotting (24). N-terminal sequencing of blotted proteins was done as described (24).
Computing-Sequence searches were performed with the BCM Search Launcher program 2 (25). First, the rat MRP N-terminal peptide sequences were compared with EST sequences with the "general protein sequence/pattern searches" and the "TBLASTN/dbest query versus 6-frame translation of dbest with Entrez & SRS links (NCBI/BCM)" subprogram. Nucleic acid sequences obtained were selected for overlapping and extension, and a consensus sequence was established using the "pileup" program (26). The putative ORF (open reading frame) was localized using the "translate" program (26). The resulting peptide sequence was aligned to the rat N-terminal peptide sequence by the "bestfit" program (26). In the case of ORFs open at their 5Ј and/or 3Ј end, the first and the last 50 nucleic acids, respectively, of the established consensus sequence were subjected to a second search using the "nucleic acid sequence search" (25). The search was performed with the "BLASTN/dbest with Repeat Masker and Entrez & SRS links (NCBI/ * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The UW/BCM)" subprogram. Newly identified nucleic acid sequences were aligned with the consensus sequence. The extended sequence was translated for detection of a complete ORF. Complete ORFs were analyzed using the "peptidesort," "peptidestructure," and "map" programs (26). Comparison of similar proteins from different species was performed using the "pileup" and "bestfit" programs (26).
Deduced protein sequences were compared with sequences from the SWISSPROT data base using the "wordsearch" program (26). Further, the sequences were analyzed by comparison to any possible ORF deducible from the genomic sequence of Saccharomyces cerevisiae (25). The ORFs were also compared with known yeast MRP sequences as listed in Ref. 2.

N-terminal Sequencing of Purified Rat MPRs and Computa-
tional Analysis-Several rat MRPs were purified from mt rprotein mixtures. The proteins were identified according to the two-dimensional map of rat MRPs (27), and their isoelectric points were determined (Table I). The proteins were blotted from two-dimensional gels onto polyvinylidene difluoride membranes and N-terminally sequenced. Twelve different N-terminal sequences were obtained ( Table I). The sequences of MRP-L22 rat and MRP-L24 rat are almost identical, and thus they are expected to be two differently modified forms of the same protein, separated by the two-dimensional PAGE technique applied. The N-terminal sequences obtained were compared with EST data base sequences. For eight proteins, groups of corresponding EST sequences from human, mouse, and rat were identified. Consensus cDNA sequences were established by multiple sequence comparisons, and the ORFs deduced were characterized and compared with each other, and to the determined N-terminal sequences of the mature rat MRP (Fig. 1). N-terminal extensions of the deduced ORFs compared with the mature rat N-terminal peptide sequences were postulated to be putative signal sequences for mt protein import. Putative signal peptides were further analyzed for general properties of mt import sequences according to Refs. 28 and 29.
MRP-L8 -In the primary search using the obtained MRP-L8 rat peptide sequence as screening probe, 12 and 5 "primary hits" of mouse and human ESTs were found, respectively. The consensus cDNA sequence of MRP-L8 mouse was assembled by multiple searches from many different EST sequences (Table  II). A complete ORF of 261 amino acid residues was identified. Positions 28 -46 of this ORF correspond to the MRP-L8 rat Nterminal peptide (Table I, Fig. 1a, present report). A cleavable mitochondrial import signal peptide (MISP) of 27 amino acid residues is postulated for MRP-L8 mouse . Correspondingly, a cDNA for MRP-L8 human was assembled encoding an ORF of 261 amino acid residues (Table II, Fig. 1a, present report). The deduced MRP-L8 mouse and MRP-L8 human show a high degree of sequence identity to each other over their entire length. An MISP of 28 amino acid residues is postulated in comparison to the rat N-terminal L8 peptide and in accordance with (28,29). Both the mouse and the human putative MISPs are highly hydrophobic; they contain only positively charged and no negatively charged amino acid residues, and very few hydroxylated amino acid residues. The arginine (R) residues in position Ϫ2 (MRP-L8 mouse ) and Ϫ10 and Ϫ2 (MRP-L8 human ) classify the peptides as RϪ2 (MRP-L8 mouse ) and RϪ10/RϪ2 (MRP-L8 human ), respectively, according to Ref. 29.
Significant sequence similarities of the mammalian L8 MRPs were detected by computer search with the yeast MRP YmL11 and bacterial r-proteins of the L10 family (Fig. 1a, Table III). However, the percentages of sequence similarities and sequence identities are low (Table III) and might be below the threshold. The family of the bacterial L10 r-proteins itself is very heterogeneous, and a comparison of the citrus greening disease-associated bacterial L10 with the E. coli L10 r-protein shows only 45% similarity and 32.4% identity over a stretch of 153 amino acid residues, respectively. These values are very low, but nonetheless the membership of similar proteins from different bacterial species to the L10 r-protein family is supported, e.g. by the similar location of the corresponding genes within the same operons, respectively. However, the similarities of the mammalian MRP-L8s to the L10 r-proteins class are the only ones to be picked up by the computer. Accordingly, we assign the mammalian MRP-L8 proteins as members of the L10 family of r-proteins.
MRP-L22/24 -Fourteen and 14 primary hits for MRP-L22 human and MRP-L22 mouse were found, respectively. Additionally, a single EST sequence of rat origin was identified (Table II). The incomplete ORF deduced for MRP-L22 rat matches the N-terminal sequence of the mature MRP-L22 rat from positions 42 to 70 (Fig. 1b, present report). An N-terminal MISP of 41 amino acids is postulated for MRP-L22 rat . For MRP-L22 mouse , a complete cDNA consensus sequence of 826 base pairs was determined (Table II). An ORF of 201 amino acid residues was deduced from this cDNA, for which the Nterminal 80 amino acid residues are almost identical to the MRP-L22 rat protein. Further, MRP-L22 mouse shows approximately 80% identity to the deduced MRP-L22 human sequence over its entire length ( Fig. 1b; see below). An MISP of 41 amino acid residues is postulated for MRP-L22 mouse . The ORF of MRP-L22 human was deduced from a consensus cDNA of 775 base pairs (Table II). MRP-L22 human consists of 206 amino acid residues, which are highly conserved as compared with the rat and mouse MRP-L22 sequences (Fig. 1b, present report). By comparison with the rat mature peptide sequence, an MISP of   Table I). x marks unidentified amino acids although the positional number is valid. a, alignment of mammalian MRP-L8 protein sequences with the citrus greening disease-associated bacterial L10 (accession no. M94319, rpIJ gene). b, alignment of mammalian MRP-L22 protein sequences. c, alignment of MRP-L25 protein sequences with the E. coli EcoL22 sequence (accession no. sw:rl22_ecoli). Section marks ( §) in the alignment of MRP-L25 human and EcoL22 mark amino acid residues that are absolutely conserved among nearly all members of the EcoL22 r-protein family. d, alignment of mammalian MRP-L27 MRP sequences with the yeast YmL27 sequence (accession no. S77888). e, alignment of the mammalian MRP-L28 sequences with the yeast YmL33 sequence (accession no. D90217), and the E. coli EcoL30 sequence (accession no. sw:rl30_ecoli). f, alignment of MRP-L31 sequences. g, alignment of the mammalian MRP-L32 sequences with the E. coli EcoL14 sequence (accession no. sw:rl14_ecoli), and the yeast YmL38 sequence (accession no. S38000). h, alignment of the mammalian MRP-S13 sequences. 46 amino acid residues is postulated for MRP-L22 human . According to Ref. 29, the putative MISPs of rat, mouse, and human MRP-L22 belong to the R-none class of MISPs. All three show the typical properties of MISPs such as many hydropho-bic, positively charged, and hydroxylated amino acid residues.
In addition, genomic sequences were identified for MRP-L22 human in the data banks. The gene for MRP-L22 human was located on chromosome 22q11 cosmid clone 102 g9 (accession   Fig. 2. The last 20 nucleotides of the consensus cDNA are mainly adenosines, which do not fit to the corresponding positions of the genomic DNA. A perfect polyadenylation signal AATAAA was identified at positions 736 -741 of the cDNA, and accordingly the deduced 3Ј end of the consensus cDNA was assumed to represent the true 3Ј end of the MRP-L22 human mRNA. MRP-L25-Three, 14, and 17 primary hits are found in the EST data bases for rat, human, and mouse MRP-L25, respectively. A truncated ORF of 91 amino acid residues was deduced for MRP-L25 rat from a single EST. Amino acids 34 -59 of this ORF match the N-terminal peptide of MRP-L25 rat obtained by amino acid sequencing (Fig. 1c, present report). The consensus cDNA sequences for MRP-L25 mouse and MRP-L25 human , respectively, were assembled from many different EST sequences (Table II). The identity of some nucleotides could only be clarified by comparison of the deduced amino acid sequences, in order to avoid frameshifts that otherwise would cause total disagreement between the supposedly closely related mouse and human sequences. The incomplete ORF of MRP-L25 mouse spans 212 amino acid residues, which are in very good agreement with the truncated MRP-L25 rat (Fig. 1c, present report). The N-terminal 180 amino residues of MRP-L25 mouse also correspond to the respective N-terminal amino acid residues of the human MRP-L25 human (Fig. 1c, present report). The consensus cDNA sequence of 824 nucleotides for MRP-L25 human was as-sembled from three different ESTs (Table II). To avoid a frameshift as compared with the deduced amino acid sequence of MRP-L25 mouse , base pair 225 of EST H87659, which is an "n," was omitted. The complete ORF deduced spans 223 amino acid residues. A polyadenylation signal AATAAA is found 3Ј to the stop codon. MRP-L25 human and MRP-L25 mouse are closely related proteins except for their respective C termini (Fig. 1c). MISPs 33 amino acids in length are postulated for all three rat, mouse, and human MRP-L22s. The signal peptides belong to the R-none class according to Ref. 29.
Similarity of the human and mouse proteins to the E. coli L22 r-protein has been postulated (EST accession no. AA101598). More than 30 different L22 r-proteins were picked out from the data bases using human or mouse MRP-L25 amino acid sequence as a screening probe. When MRP-L25 human is compared with EcoL22 at the amino acid residue level, a sequence similarity of only 42%, and a sequence identity of 31.8% was detected, covering a region of 84 amino acid residues (Table III). However, when the complete MRP-L25 human sequence was compared with different members of the L22 protein family, several amino acids were identified that are identical among all of them (Fig. 1c). Thus, the affiliation of the mammalian MRP-L25 proteins to the EcoL22 r-protein family is confirmed.
MRP-L27-Four and 27 primary hits were found for human and mouse MRP-L27, respectively. Complete ORFs of 136 and 134 amino acid residues for MRP-L27 human and MRP-L27 mouse were deduced from consensus cDNAs (Table II, Fig. 1d, present report). Both deduced ORFs are quite similar to each other. The ORF of MRP-L27 mouse starts with an ATG, which is surrounded by the appropriate nucleotides that are common for eukaryotic translational start codons (data not shown). An in-frame stop codon precedes this translational start, thus making the N terminus of the MRP-L27 mouse ORF highly probable. The start codon of MRP-L27 human lacks one nucleotide of the appropriate consensus sequence for eukaryotic start codons. No in-frame stop codon is found 5Ј of the translational start, since the 5Ј end of the assembled consensus cDNA does not extend far enough into 5Ј direction. Nevertheless, we assume this to be the true start codon of MRP-L27 human due to the amino acid sequence similarity to MRP-L27 mouse downstream, and the complete disagreement of the deduced amino acid sequences upstream of the start codon. Thus, in compari-

TABLE III
Sequence comparison of similar ribosomal proteins from mitochondria, and bacteria Proteins in horizontal lines are compared to proteins in vertical columns. aa, extension of similar sequences in numbers of amino acid residues. %, numbers before the slash (/) give the similarity of the compared amino acid stretches in percentage, numbers after the slash (/) give the identity of the compared amino acid stretches in percentage, as calculated by the "bestfit" computer program (26). EcoL, E. coli r-protein of the large (L) subunit. YmL, yeast MRP of the large (L) subunit. The species of the citrus greening disease-associated bacterium is not further specified in the data base (accession no. M94319, rpIJ gene).  son to the mature N terminus of rat MRP-L27 for MRP-L27 human and MRP-L27 mouse , MISPs of 13 amino acid residues are postulated, respectively (see Fig. 1d). Both MISPs are highly hydrophobic with few (two in human, one in mouse) positively charged and two hydroxylated (mouse) amino acid residues. Both signal peptides belong to the R-none class of signal peptides according to Ref. 29. The consensus cDNAs of human and mouse show polyadenylation signals ATTAAA closely located downstream of the respective stop codons. The nucleotide environments of these signals are highly conserved between the MRP-L27 mouse and MRP-L27 human consensus cDNAs. Sequence comparison revealed weak but significant sequence similarities of the mammalian MRP-L27 to the N-terminal portion of the yeast YmL27 MRP (Fig. 1d, Table III). Although the sequence similarities are not high, they are comparable to the values obtained for other mammalian MRPs similar to bacterial and yeast mitochondrial r-proteins (Table  III). The yeast YmL27 shows no sequence similarity to any known r-protein (31), and thus the discovery of the mammalian counterparts of YmL27 defines the first MRP family that is not similar to known r-proteins from other sources.
MRP-L28 -One, 6, and 48 primary hits were found for rat, human, and mouse MRP-L28, respectively. For MRP-L28 rat , an incomplete ORF was identified. The last 11 amino acid residues of the ORF deduced are in complete agreement with the first 11 amino acid residues of the mature MRP-L28 rat protein, as determined by amino acid sequencing (Table I, Fig. 1e, present report). Furthermore, most of this ORF shows strong sequence similarities to the ORFs deduced for MRP-L28 human and MRP-L28 mouse , respectively (Fig. 1e). Interestingly, among the human MRP-L28 ESTs, two different groups of consensus cDNA sequences were identified and assembled. MRP-L28 human 1 (Table II, Fig. 1e) encodes an ORF of 162 amino acid residues. The consensus cDNA of MRP-L28 human 2, which is supported by four independent ESTs, is not complete (Fig. 1e). Strikingly, it contains a stop codon (*) followed by a frameshift (-) caused by a missing nucleotide in the cDNA coding for the putative signal peptide (see Fig. 1e). The stop codon as well as the frameshift are found in all of the four detected ESTs. Further, the four ESTs of MRP-L28 human 2 are all of fetal liver spleen origin, whereas the ESTs identified for MRP-L28 human 1 are products of mRNA isolated from different tissues such as retina, melanocyte, fetal liver spleen and fetal heart, and parathyroid tumor. Thus, irrespective of whether the MRP-L28 human 2 mRNA is a product of a pseudogene not being translated in vivo, it seems that for MRP-L28 human at least two different genes do exist that are transcribed in a tissue-specific manner.
The MRP-L28 mouse consensus cDNA codes for an ORF, which is quite similar in sequence to the MRP-L28 rat , MRP-L28 human 1, and MRP-L28 human 2, respectively (Fig. 1e). However, within the 5Ј part of the consensus cDNA (Table II), a single frameshift is found. This frameshift is supported by six out of six ESTs identified for this region. The frameshift causes an alternative N terminus of MRP-L28 mouse without a translational start codon, which is not similar to the corresponding N termini of rat and human MRP-L28s (Fig. 1e). If the frameshift is not taken into account, the N terminus of MRP-L28 mouse corresponds well to the respective rat and human sequences (Fig. 1e). It might be speculated that the apparent frameshift is caused by sequencing errors, or that the MRP-L28 mouse cDNA is the MRP-L28 human 2 "homologue" rather than the MRP-L28 human 1 homologue. For all MRP-L28s (neglecting the frameshifts), MISPs of 34 amino acid residues are postulated. For MRP-L28 human 1 and MRP-L28 human 2, stop codons 5Ј to the translational start support this assignment. In the case of rat and mouse MRP-L28s, there are no stop codons preceding the starting methionine, due to the short 5Ј sequences of the respective consensus cDNAs (Table II). The 5Ј encoded amino acid sequence is shown in the case of MRP-L28 rat (Fig. 1e), but is very unlikely to be a true part of the MISP. All four MISPs are characterized by a high proportion of hydrophobic and positively charged amino acid residues. With the exception of MRP-L28 human 2, they belong to the RϪ2 class of cleavable signal sequences according to Ref. 29. In MRP-L28 human 2, the arginine in position Ϫ2 is replaced by a histidine (Fig. 1e).
The MRP-L28 mouse and MRP-L28 human 1 sequences showed a weak but significant sequence similarity to the yeast YmL33 MRP and the EcoL30 r-protein, respectively. The regions of sequence similarity span the total length of the EcoL30 protein, and 49 amino acid residues of the N terminus of YmL33, and the middle part of MRP-L28 mouse , respectively (Fig. 1e, Table  III). Interestingly, the overall size of the yeast MRP is considerably less than that of the mammalian MRPs. The EcoL30 protein itself is only two-third the size of the yeast YmL33. Thus, the putative "core" of these different r-proteins is much smaller than the mitochondrial representatives of unicellular and multicellular eukaryotes.
MRP-L31-Three, 11, and 28 primary hits were found for rat, human, and mouse MRP-L31, respectively. For MRP-L31 rat , an incomplete ORF of 127 amino acid residues was deduced from the cDNA lacking some amino acid residues of the putative MISP (Fig. 1f, Table III). Amino acid residues 18 -40 correspond to the N-terminal amino acid sequence of the mature MRP-L31 rat obtained by amino acid sequencing. The MRP-L31 rat sequence is very similar to the deduced amino acid sequences of MRP-L31 mouse and MRP-L31 human (Table II, Fig.  1f, present report). MISPs of 31 and 32 amino acid residues, respectively, are postulated (Fig. 1f). The MISPs are highly hydrophobic and positively charged, and contain hydroxylated but no negatively charged amino acid residues. The (truncated) MRP-L31 rat MISP belongs to the RϪ2 class, whereas the mouse and human MRP-L31 MISPs belong to the R-none class of mt import peptides according to Ref. 29. No significant sequence similarity to any known protein was found. Thus, the mammalian L31 MRPs define a new class of MRPs.
MRP-L32-For the MRP-L32 proteins 2, 10, and 38, primary hits were identified for rat, human, and mouse in the EST data bases, respectively. An incomplete ORF of 101 amino acid residues was deduced from a single rat EST sequence (Table III). From amino acid residues 2 to 59, this ORF corresponds to the amino acid sequence of the mature MRP-L32 rat determined by amino acid sequencing (Fig. 1g). The complete ORFs deduced for both the MRP-L32 mouse and MRP-L32 human proteins are quite similar except for their extreme N termini (Fig. 1g, present report). Both proteins show a very good sequence correspondence to the MRP-L32 rat sequence (Fig. 1g). MISPs of 30 and 64 amino acid residues are postulated for MRP-L32 mouse and MRP-L32 human , respectively. However, the elongated form of MRP-L32 human seems to be unlikely, since the surrounding of the second ATG codon perfectly matches the consensus sequence for the start of eukaryotic translation (as the mouse ATG does), whereas the first ATG does not. The MRP-L32 mouse start codon is preceded by an in-frame stop codon. Both the mouse and human N termini show general properties of MISPs, such as a high content of hydrophobic, positively charged, and hydroxylated amino acid residues, but no negatively charged residues. The mouse MISP belongs to the RϪ10 class according to Ref. 29. In the human sequence, the arginine (R) at position Ϫ10 is replaced by a histidine (H). However, it has not been shown so far that histidine can functionally replace arginine in MISP processing.
The mammalian L32 MRPs show significant sequence similarities as compared with E. coli r-protein L14 and the corresponding YmL38 MRP of yeast (Fig. 1g, Table III). The N termini of the latter two correspond to the postulated mature N termini of the mammalian L32 MRPs. Interestingly, the YmL38 lacks a cleavable MISP (2). Thus, the N terminus of the mature YmL38 corresponds to the N termini of the mature (i.e. after mt import and processing) mammalian MRP-L32s (Fig. 1g).
MRP-S13-Astonishingly, only three and one ESTs were found in the primary search for rat and mouse MRP-S13, respectively. Human ESTs corresponding to this MRP were identified using the mouse cDNA sequences as "screening probes." For MRP-S13 rat , an ORF was deduced that lacks an N-terminal start codon and an N-terminal extension as compared with the mouse and human MRP-S13s (Table II, Fig. 1h, present report). Although 50 different ESTs of rat origin were identified by using the extreme 5Ј end of EST AI0455711r,c (see Table II) as screening probe, none of them extends more than 15 base pairs in the 5Ј direction. Because the MRP-S13 rat ORF presented in Fig. 1h is preceded by three in-frame stop codons, and because the deducible amino acid residues at the very N terminus do not correspond to the respective MRP-S13 mouse amino acid sequence in the same position, we assume that the 5Ј end nucleotide sequence of EST AI0455711r,c represents an intron that was not reverse transcribed during EST creation and/or sequenced. The mature MRP-S13 rat as compared with the N-terminal sequence obtained by direct amino acid sequencing is highly conserved, as compared with the respective mouse and human MRP-S13 sequences (Fig. 1h). For MRP-S13 mouse , an ORF of 200 amino acid residues was deduced from a consensus cDNA of 977 nucleotides (Table II). The pre-mRNA of MRP-S13 mouse contains at least three introns, two at positions 107/108 and 154/155, and a third intron of 80 nucleotides at position 363/364 of the consensus cDNA. This was deduced by comparison of EST sequences derived from incompletely spliced mRNA molecules with the mature consensus cDNA. At position 925-930, a polyadenylation signal AATAAA was found, and 28 consecutive adenine residues mark the location of the poly(A) tail from position 943 onward. The deduced ORF is highly conserved as compared with the mature MRP-S13 rat sequence (Fig. 1h, present report). For MRP-S13 human , a consensus cDNA was assembled, and the corresponding genomic DNA was localized 3Ј of the GnRH-II gene (Ref. 32; accession no. AF036329). The EST consensus cDNA sequence corresponds to that of the genomic DNA from nucleotide 3763 to nucleotide 4424 of the latter. Two short introns are covered by this region (data not shown). However, the genomic DNA sequence does not cover the entire MRP-S13 human sequence. Accordingly, the C terminus was completed by adding EST derived consensus cDNA sequences (Table II).
For both the MRP-S13 mouse and the MRP-S13 human , an MISP of 27 amino acid residues is postulated, respectively. Although not identical in sequence, both peptides are quite similar in specific properties such as a high content of hydrophobic amino acid residues and a net positive charge. The MRP-S13 mouse MISP belongs to the RϪ2 class, and the MRP-S13 human MISP belongs to the RϪ3 class of signal peptides according to Ref. 29. In general, the MISPs of the mammalian MRPs presented in this study are quite similar to each other. This conclusion is not based on the primary amino acid sequences, which are heterogeneous; instead, an analysis of the properties shows common features. The MISPs of the MRPs are between 27 and 46 amino acids in length. The only exceptions are the MRP-L27 MISPs of 13 amino acid residues. Nearly all of them show a structure prediction of an N-terminal ␣-helix joined to a C-terminal ␤-sheet. The putative MISP of MRP-L32 human of 64 amino acid residues shows the ␣-helix-␤-sheet motif twice in a row. All MRP MISPs are highly hydrophobic, they have a net positive charge, and they contain less hydroxylated amino acid residues than is common for other MISPs (28).
Altogether, sequences of 23 different mammalian MRPs have been identified by comparison with the N-terminal peptide sequences of purified rat MRPs determined by biochemical methods (Table I). The significance of the deduced ORF sequences is influenced by the inaccurate EST sequencing results. However, although all the ORF sequences deduced need further confirmation by classical cDNA isolation and sequencing, the deduced amino acid sequences are reliable in terms of consensus cDNA assembling and sequence comparison with similar proteins of the same r-protein family. Thus, eight classes of mammalian MRPs were characterized in this work. DISCUSSION Our understanding of mt genetics linked to mutations in nuclear genes suffers from a lack of knowledge of the influence of nuclear encoded proteins on mt maintenance and function. This is also a crucial point in the elucidation of molecular mechanisms for nuclear-inherited mitochondrial diseases. Nuclear-encoded MRPs are good candidates for proteins involved in mt genetics, as has been shown in yeast (1,2). In the present report, we describe the identification of eight groups of mammalian MRPs from rat, mouse, and human.
It might seem surprising that the numbers of ESTs picked from the data bases at least as primary hits vary so much among the different MRPs investigated. One could assume that ESTs respective mRNAs of different gene products, which are present in stoichiometric amounts within the cell (the mt ribosome), would be represented in the data bases in similar ratios. In yeast the expression of MRP genes is differentially regulated at the mRNA, translational, and protein stability levels. Thus, equal amounts of MRP mRNAs are not present (for a review, see Ref. 2). Additionally, it should be noted that ESTs derive from different healthy and tumor tissues. The degree of respiration and the mitochondrial content of different tissues differs remarkably, causing a different level of expression of nuclear genes for mt proteins. The automatic processing of mRNAs yielding ESTs may also influence the occurrence of individual ESTs by selective preferences for the oligonucleotides and reverse transcriptases used. Therefore, it seems rather unlikely that the relative amounts of MRP ESTs detected would mirror any (assumed) molecular ratios within an "ideal" cell.
By comparison of the deduced MRP ORFs with the rat Nterminal amino acid sequences of the mature MRPs isolated, we postulated several MISPs. These postulated peptides were further analyzed according to the criteria of common features for such import sequences (28,29). However, no consensus sequence in the sense of a canonical amino acid sequence has been found for MISPs, and it is unlikely that such a consensus will emerge. Thus, in general, the mammalian MRP MISPs do indeed fit the postulated properties such as a highly hydrophobic character, a net positive charge, and hydroxylated amino acid residues (28). The assignment of the mammalian MRP MISPs to various classes of substrates for the mt processing proteases is common to MRPs. This has been shown for more than 30 different yeast MRPs (2). Although the MISPs presented in this study are on average 30 -40 amino acid residues in size, shorter ones such as the MISPs of the mammalian MRP-L27s (13 amino acid residues) are not unusual. In yeast short MISPs (between 14 and 7 amino acid residues) have been identified for MRPs (2). The lack of the mammalian MRP MISPs for hydroxylated amino acid residues may be a specific property of this class of mammalian proteins imported into mitochondria.
Five of the different mammalian MRP classes show significant sequence similarities to r-proteins from other mt and/or bacterial sources. However, the calculated values for sequence similarities and identities are low as compared with most of the homogeneous classes of r-proteins from other sources (33). In this context it should be noted that a comparison of yeast MRPs to r-proteins of defined classes reveals that MRPs in general seem to be much more divergent (for a review, see Ref. 2). MRPs show N-and C-terminal and/or internal sequence elongations as compared with eubacterial/chloroplast/eukaryotic cytoplasmic r-proteins. MRPs are less similar to other members of the same r-protein family and among species (2). Mammalian and yeast MRPs are as much divergent as mammalian MRPs and E. coli r-proteins. This is in sharp contrast to rat and yeast cytoplasmic r-proteins, which on average share 60% identical amino acid residues (34). On the other hand, the specific large subunit r-proteins picked out from the data bases by comparison to mammalian large subunit MRPs point to a reduced but still reliable sequence conservation of mammalian MRPs and bacterial r-proteins. It should be mentioned that, in certain families of bacterial r-proteins, some of the members are so divergent that they are not assigned as similar by a direct sequence comparison, but only by "intermediates" from other sources (see "MRP-L8" under "Results"). Thus, mammalian MRPs also fit to these divergent classes of r-proteins.
Since mt ribosomes contain many more proteins than cytoplasmic ribosomes, it is not surprising that several MRPs were identified for which no similar proteins, e.g. in E. coli ribosomes, exist. These "new" MRPs define new classes of r-proteins. They may represent the "molecular excess" of proteins as compared with cytoplasmic ribosomes. In yeast the majority of MRPs are not similar to any other r-protein (2). However, the fact that only one of the mammalian MRPs which are not similar to E. coli r-proteins shares sequence similarity with yeast MRPs (YmL27/MRP-L27 mammalian ) (i) may result from the still incomplete characterization of all yeast MRPs, and/or (ii) may be the consequence of further reaching divergences of yeast and mammalian mt ribosomes in their respective protein compositions. It should be noted at this point that yeast and mammalian mt ribosomes differ strongly in their protein/RNA ratio, although they seem to possess the same total molecular mass (35). The mammalian MRP-L32s, MRP-L31s, and MRP-S13s seem to represent such additional proteins, as compared with E. coli and yeast mt r-proteins. Due to the stringent washing and purification methods applied, it can be excluded that these additional proteins represent mt translational factors loosely attached to the mt ribosome rather than true MRPs. The purification methods for yeast and rat MRPs are comparable, and no mt translational factor has so far been found as a contaminant of mt ribosome preparations (36). Only two-dimensional PAGE spots, which appear reproducibly in stoichiometric amounts in repetitive experiments, are counted as true MRPs (27).
In general, MRPs are much less conserved from one species to another than cytoplasmic r-proteins. This finding raises questions concerning the molecular mechanisms that have allowed MRPs such a divergent evolution while keeping the mitochondrial protein biosynthesis machinery intact. Furthermore, is it possible to generalize results about MRPs to the same extent as has been done for bacterial and eukaryotic cytoplasmic r-proteins? Obviously, the use of yeast MRPs as a model system for mammalian MRPs suffers from the low con-servation of sequences and proposed functions. A practical consequence is that it will not be possible to identify most of the mammalian MRP genes simply by comparison of yeast MRP sequences with the increasing number of unknown mammalian EST and genomic DNA sequences. Peptide sequences obtained from mammalian MRPs are much more helpful for this purpose. Short peptide sequence data correspond to DNA sequence data from the same species or close relatives strongly enough to identify the corresponding genes unambiguously. This method has been successfully applied to yeast MRPs (24,36). Further, the correspondence of human cytoplasmic r-proteins separated by two-dimensional PAGE to sequenced r-protein genes has been proven (30). In this work, we have applied this approach for the first time to mammalian MRPs and their corresponding genes.