Expression of a micro-protein.

The smallest known open reading frame encodes the ribosomal protein L41, which in yeast is composed of only 24 amino acids, 17 of which are arginine or lysine. Because of the unique problems that might attend the translation of such a short open reading frame, we have investigated the properties and the translation of the mRNAs encoding L41. In Saccharomyces cerevisiae L41 is encoded by two linked genes, RPL41A and RPL41B. These genes give rise to mRNAs that have short 5' leaders of 18 and 22 nucleotides and rather long 3' leaders of 203 and 210 nucleotides not including their poly(A) tails. The mRNAs are translated exclusively on monosomes, suggesting that ribosomes do not remain attached to the mRNA after termination of translation. Calculations based on the abundance of ribosomes and of L41 mRNA indicate that the entire translation event, from initiation through termination, must occur in approximately 2 s. Termination of translation after only 25 codons does not subject the mRNAs encoding L41 to nonsense-mediated decay. Surprisingly, despite the L41 ribosomal protein being conserved from the archaea through the mammalia, S. cerevisiae can grow relatively normally after deletion of both RPL41A and RPL41B.

The smallest known open reading frame encodes the ribosomal protein L41, which in yeast is composed of only 24 amino acids, 17 of which are arginine or lysine. Because of the unique problems that might attend the translation of such a short open reading frame, we have investigated the properties and the translation of the mRNAs encoding L41. In Saccharomyces cerevisiae L41 is encoded by two linked genes, RPL41A and RPL41B. These genes give rise to mRNAs that have short 5 leaders of 18  Our conventional view of translation is based on mRNAs that can accommodate several ribosomes to form a polyribosome. During translation, their products will pass through a cavity in the large subunit of the ribosome, before emerging at the bottom of the subunit. This passage, originally identified by Yonath et al. (1) and recently analyzed at high resolution (2), will accommodate some 30 -40 amino acids (3). Because nearly all proteins are synthesized as polypeptides larger than 50 amino acids, they will naturally emerge from the ribosome during translation and be available for the folding chaperones (4).
An exception to both these conventions is ribosomal protein L41. L41 was originally purified from the ribosomes of Saccharomyces cerevisiae and identified as a very small, very basic protein that appeared to have orthologues in the ribosomes of Schizosaccharomyces pombe and of man. The first 25 amino acids were sequenced by Edman degradation (5). Surprisingly, cloning of the gene encoding L41 revealed that these 25 amino acids comprised its entire open reading frame (6)! Seventeen of its 25 amino acids are Arg or Lys. Thus, L41 is not only the smallest but also the most basic eukaryotic protein. L41 is highly conserved in eukaryotes (7); indeed its mRNA is among the ten most abundant in a recent serial analysis of gene expression of human transcripts (8). L41 is present in certain archaea, e.g. Methanococcus jannaschii (9), but not in others. No clear orthologue has been found in eubacteria, although a protein with similar properties is present in certain thermophilic eubacteria, e.g. Thermus thermophilus (10). (BEWARE: the name L41 has also been applied to a different fungal ribosomal protein, mutant alleles of which can cause resistance to cycloheximide (11,12). In the official nomenclature that protein is now known as L42 (13).) In S. cerevisiae, L41 is encoded by two genes, RPL41A and RPL41B (6) (Fig. 1). There is a single base difference in the ORFs 1 of the two genes, Ala-3 being encoded by GCC in RPL41A and by GCT in RPL41B. However, the two genes differ almost completely in both the 5Ј-and 3Ј-flanking regions. The mRNAs encoding L41 in S. cerevisiae have been reported to be about 325 nucleotides in length (6), suggesting that they contain substantial amounts of 5Ј-and/or 3Ј-untranslated sequence.
Because of the special features that might be apparent in the translation of such a short ORF, we have explored in more detail the mRNAs encoding L41 and their translation. We find that the mRNAs derived from the two genes both have short 5Ј-UTRs and unusually long 3Ј-UTRs. They are translated almost exclusively on single ribosomes. It is interesting that despite their translation terminating after only 25 codons the mRNAs encoding L41 are not subject to nonsense-mediated decay. Although L41 is found widely in nature, deletion of the two RPL41 genes does not prevent the growth of S. cerevisiae.
Northern Blot Analysis-Northern blot analysis was carried out as described previously (16). 32 P-labeled RNAs were used as probes for RPL30, ACT1, and RPL41 mRNAs. Oligonucleotide probes specific for RPL41A and RPL41B mRNAs and for 18S rRNA and 25S rRNA were end-labeled with polynucleotide kinase and [␥-32 P]ATP.
Rapid Amplification of cDNA Ends (RACE)-The RACE method was adopted from Frohman (17), using three adaptor primers referred to as Q T , Q O , and Q I : Q T : CCAGTGAGCAGAGTGACGAGGACTCGAGCTCAAGCTTTTT-TTTTTTTTTTT Q O : CCAGTGAGCAGAGTGACG Q I : GAGGACTCGAGCTCAAGC 3Ј-RACE-5 g of total RNA (in a 2-l volume) was primed with 40 * This work was supported in part by National Institutes of Health Grants GM25532 (to J. R. W.) and CA13330 (to the Albert Einstein Cancer Center). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  ng of Q T primer and reverse transcribed by 1 l (200 units) of MMLV reverse transcriptase in a 10-l reaction. The reaction mixture was diluted to 100 l, and then two consecutive rounds of PCR amplification were carried out in a 50-l PCR mixture (1 mM dNTPs, 1ϫ PCR buffer, 1.5 mM Mg 2ϩ , 2.5 units Taq polymerase). For the first round PCR, 1 l of diluted cDNA pool together with 25 pmol of Q O primer and 25 pmol of gene-specific primer 1 were used. For the second round, 25 pmol of Q I primer and 25 pmol gene-specific primer 2 were used. Oligonucleotides used (See Fig. 1, lower panel) were: 5Ј-RACE-5 g of total RNA was reverse transcribed using 12.5 pmol of gene-specific primer and MMLV reverse transcriptase in a 10-l reaction. The 5Ј partial cDNA pool was diluted to 100 l purified, concentrated to 10 l, added to terminal transferase components (0.5 mM ATP, 1ϫ TdT buffer, 0.5 l of terminal deoxynucleotidyl transferase) to a final volume of 20 l, and incubated at 37°C for 20 min. The product was diluted to 100 l of which 1 l was used for amplification by two rounds of PCR similar to 3Ј-RACE. Oligonucleotides used (See Fig. 1, lower panel) were: Polyribosome Analysis-Yeast cells were grown in 50 ml of yeast extract-peptone-dextrose to mid-log phase and chilled by addition of crushed ice immediately following the addition of cycloheximide (50 g/ml). The cells were harvested and washed twice in 0.1 M NaCl, 0.03 M MgCl 2 , 0.01 M Tris, pH 7.4, 50 g/ml cycloheximide, 200 g/ml heparin and resuspended in 0.5 ml LHB buffer (0.1 M NaCl, 0.03 M MgCl 2 , 0.01 M Tris, pH 7.4). Cells were lysed by vortexing with glass beads and the lysate centrifuged twice for 15 min. at 15,000 ϫ g. The supernatant was layered onto a 7-47% sucrose gradient in TMN solution (0.05 M Tris acetate, pH 7.0, 0.05 M NH 4 Cl, 0.012 M MgCl 2 ) and centrifuged at 4°for 3 h at 39,000 rpm in a SW41 rotor. Gradients were then collected through an ISCO fractionator into Eppendorf tubes containing 0.1 ml of 10% SDS. Each fraction was extracted with hot phenol and the RNA was collected by ethanol, fractionated on a denaturing agarose gel, and subjected to Northern analysis (16).
Primer Extension-Primer Extension System-AMV Reverse Transcriptase (Promega) was used for primer extension analysis. 10 pmol of specific primer was labeled with [␥-32 P]ATP (3000 Ci/mmol) using 10 units of T4 polynucleotide kinase, annealed to 10 g of total RNA, and reverse transcribed using 1 unit of avian myeloblastosis virus reverse transcriptase in a 20-l reaction at 42°C for 30 min. Primer extension products were electrophoresed on a denaturing 8% polyacrylamide gel containing 8 M urea. The gel was dried and exposed in a phosphorimaging cassette.
One-step Gene Disruption-The RPL41 genes were deleted using a PCR-based one-step gene disruption (18). A set of oligonucleotides that contain 45 nt of RPL41A flanking sequence and 23 nt of HIS3 sequence were designed to carry out PCR using the HIS3 gene in pRS303 as the template. An analogous set of oligonucleotides for the RPL41B and URA3 genes were used for PCR using the URA3 gene of pRS306. Amplified DNA fragments were purified and used to transform yeast cells by homologous recombination. Transformants were screened on selective plates.
Southern Blot Analysis-Total yeast DNA was digested with EcoRI, separated by electrophoresis on a 0.8% agarose gel, and transferred onto a Zeta-Probe blotting membrane. 32 P-labeled DNA probes were prepared by random primer extension of a fragment containing either the RPL41A or RPL41B gene and flanking sequences within the two EcoRI cutting sites.

RPL41 mRNA Has Unusually Short 5Ј-UTRs and Unusually
Long 3Ј-UTRs-To identify the ends of the RPL41 transcripts, we utilized the RACE method (17). cDNAs representing the region between a single point in a mRNA transcript and its 3Јor 5Ј-end were amplified using PCR. Because the coding sequences of the RPL41 genes are nearly identical, it was necessary to use gene-specific primers, oriented in the direction of the missing sequence. Extension of the partial cDNAs from the unknown end of the message back to the known region is achieved using primers that anneal to the preexisting or an appended poly(A) tail.
The amplified cDNA fragments were excised from an agarose gel and sequenced (Fig. 1). Unique sequences were found, suggesting that there is a single site of initiation and termination for each gene. Both RPL41A and RPL41B mRNA have unusually short 5Ј-UTRs, 22 nucleotides in the case of RPL41A and 18 nucleotides for RPL41B. By contrast the 3Ј-UTRs are unusually long, ϳ210 and ϳ203nts for RPL41A and RPL41B, respectively. A similarly long 3Ј-UTR has been observed for the human transcript encoding L41 (19). Because the mRNAs end at a series of A residues in the gene (underlined), the exact site of the cleavage that precedes the addition of poly(A) is indeterminate. With the addition of ϳ50 poly(A) residues, the sizes described in Fig. 1 would be consistent with a published size of 325 nt based on Northern analysis (6).
Although the signals for cleavage and polyadenylation in S. cerevisiae are less rigid than in mammals, three elements have been identified (reviewed in Refs. 20 and 21) and supplemented with an extensive computer analysis (22). Comparison of the presence of these signals in the two RPL41 genes is illuminating. An "upstream" or "efficiency" element, UAUAUA, is present twice in the A gene but not the B gene. A "positioning" element, AAUAAA, is present in the B gene but only as a variant, AAUCAA, in the A gene. The poly(A) site, itself, generally Y(A) n , is present in the B gene but as the variant YG(A) n in the A gene. Termination at the latter may be enhanced by its (U) 7 element that is reported to facilitate 3Ј-cleavage and poly(A) addition (22). Thus, the two genes each possess some but not all of the elements used to denote 3Ј-cleavage in yeast.
The non-coding sequences of the two genes diverge from position Ϫ3 upstream of the ORF and from position ϩ8 downstream of the ORF. Indeed, their putative transcription factor binding sites, from positions Ϫ200 to Ϫ600, differ substantially (23). Yet, the sum of the effects of transcription factor binding sites, transcription initiation sites, and 3Ј-cleavage signals provide just the appropriate amount of mRNA to enable equimolar synthesis of L41 with the other ribosomal proteins (see below).
RPL41 mRNA Is Exclusively Translated on Single Ribosomes-To ask how RPL41 mRNA is translated, we carried out a polyribosome analysis. Yeast ribosomes were separated in a 7-47% sucrose gradient, and total RNA from each fraction was extracted and subjected to a Northern analysis (Fig. 2). The positions of 18S rRNA and the 25S rRNA indicate the 40S subunits and the 60S subunits, respectively. ACT1 mRNA, encoding a 478-amino acid protein, is primarily translated on higher order polyribosomes, most of which have run to the bottom of the gradient. RPL30 mRNA, encoding a 104-amino acid protein, is translated largely on dimer and trimer polysomes. By contrast, RPL41 mRNA, encoding its 25-amino acid protein, is translated exclusively on single ribosomes, although there is some forward spreading of the peak.
The most recent estimate, based on sensitivity to hydroxyl radicals, is that about 58 nucleotides of mRNA are protected by a translating 70S-ribosome of Escherichia coli (24). The larger eukaryotic ribosome would probably protect a somewhat longer stretch. Thus, in mid-translation a single ribosome should occupy essentially the entire L41 ORF. When it reaches the termination codon, however, some 50 nucleotides should be available for the binding of a new 43S initiation complex. The few mRNAs with both a 43S-initiation complex as well as an 80S-ribosome might account for the forward spreading from the 80S-peak observed in Fig. 2, lower panel. Finally, Fig. 2 suggests that the extensive 3Ј-UTR does not stably associate with a ribosome that has terminated translation. Nevertheless, it is possible that the long 3Ј-UTR of the RPL41 mRNAs facilitates the interaction of proteins bound to the 5Ј-CAP with those bound to poly(A), as suggested in the "circular" polyribosome model recently proposed (25).
Small RPL41 Transcripts Are Not Subject to Nonsense-mediated Decay-Nonsense mediated decay (NMD) is a highly conserved mechanism used to rid the cell of mRNAs that are likely to produce aberrant proteins because of premature stop codons introduced either by mutation or by errors in splicing (reviewed in Ref. 26). UPF1 is one of the genes essential for NMD. Deletion of UPF1 results in accumulation of aberrant mRNAs, such as the unspliced transcripts of the ribosomal protein gene, CYH2 (27) (Fig. 3). We asked whether the termination codon of either L41-encoding transcript, only 25 codons from the initiator, would invoke NMD. As shown in Fig. 3, it does not. There is no detectable difference in the level of L41 mRNA between strains carrying a UPF1 or a upf1::LEU2 allele, while the amount of pre-CYH2 mRNA increases substantially. Although this is perhaps to be expected because the L41 mRNAs are "natural", this result emphasizes the extraordinary subtlety that one must invoke to explain NMD. The 3Ј-UTRs of the two L41 transcripts do not appear to contain a sequence match to the putative downstream elements that activate NMD (28,29) although such an element has been identified in only a small fraction of yeast genes.
Is L41 Essential for Cell Growth?-Because most ribosomal proteins are essential, knockout experiments are usually conducted on diploid strains. On the other hand, in cases such as L41 where there are two genes, it is generally easier to analyze a double knockout by mating two single knockout strains and dissecting the resulting spores. However, in S. cerevisiae the two RPL41 genes are tightly linked on chromosome IV, separated by only 8 kb. Therefore, we started with a homozygous diploid strain, first disrupting RPL41A with a HIS3 marker (see "Materials and Methods"). Using this strain we disrupted RPL41B with URA3. We expect that half the URA3 disruptants would be on the same chromosome as the HIS3 disruptants. After the second transformation, several colonies that grew on a ϪHisϪUra plate were subjected to PCR to identify colonies in which both the HIS3 and URA3 genes had integrated into the correct loci (data not shown). These double mutants were sporulated, and the resulting tetrads were analyzed. Unexpectedly, colonies with the genotype His ϩ Ura ϩ survived. A Southern blot (Fig. 4) as well as primer extension (see below) demonstrated that both RPL41A and RPL41B had been successfully deleted. The haploid double mutant grows well on a ϪHisϪUra plate at 30°C as well as at 23°C or 37°C, showing that it is neither heat-nor cold-sensitive. As with most ribosomal proteins, the function of L41 is still unknown. Yet, it is surprising that this highly conserved protein seems almost entirely dispensable for growth. On the other hand, there is a report that L41 may be absent from the proteome of Caenorhabditis elegans (30), suggesting that it is not essential even in higher organisms. Unfortunately, the one high-resolution structure of the large subunit is from Haloarcula marismortui, (2) an archaeon that appears not to have a version of L41.
Relative Amount of RPL41A and RPL41B Transcripts-To measure the contribution of the RPL41A and RPL41B genes to the cell, we made use of the fact that their mRNAs differ by 4 nt in the length of the 5Ј-UTR. Thus, the relative amount of the two transcripts can be assayed by primer extension, using a primer complementary to a sequence common to both ORFs. The primer used leads to a 105-nt band from the RPL41A transcript and a 101-nt band from the RPL41B transcript (Fig.  5). It is apparent that in wild type cells the ratio of RPL41A mRNA to RPL41B mRNA is about 2:1. The amount of RPL41A mRNA appears to be similar to that of RPL30, determined to be about 35 copies/cell (31), leading to an estimate of about 50 mRNAs/cell for mRNAs encoding L41. This is a useful value because the small L41 ORFs are omitted from many of the measurements of genome-wide transcription. In heterozygous diploid strains of genotype RPL41A/rpl41A or RPL41B/ rpl41B, the amount of L41A or L41B mRNA is reduced ϳ50%, respectively, as expected. Interestingly, in a haploid ⌬RPL41A strain the amount of L41B transcript increases about 35%, suggesting some measure of dosage compensation as has been shown for CRY2, encoding ribosomal protein S14 (32). No ob-vious change in the level of L41 mRNA was observed in a haploid ⌬RPL41B strain, perhaps reflecting the relatively larger amount of RPL41A transcript that is normally present.   2 and 3). The blot was probed by ␥-32 P-labeled oligonucleotide probes specific for RPL41, RPS3, RPS30, CYH2 (also known as RPL28), and the snoRNA U3 as a loading control. CYH2 gives two bands: the upper one is the unspliced transcript that is subject to nonsense-mediated decay. The values on the right represent the average of four determinations of the ratio of the mRNA in a upf1 strain compared with a UPF1 strain (in all cases normalized to U3).

FIG. 4. Southern blot analysis of the RPL41 deletion strains.
Genomic DNAs of the indicated strains were extracted, digested by EcoRI, and separated on a 0.8% agarose gel, and the RPL41A and RPL41B genes were identified with a random primer probe made against the L41 ORF. The 1.2-and 3.1-kb bands in wild type represent the EcoRI fragments of RPL41A and RPL41B, respectively. The 2.2and the 4.0-kb bands indicate the insertion of HIS3 into RPL41A and URA3 into RPL41B. D, heterozygous diploid; H, haploid.
FIG. 5. Primer extension analysis of RPL41 transcripts. 10 mg of total RNA from the indicated strains were used as template. A primer complementary to a sequence common to both RPL41A and RPL41B near the 3Ј-end of the ORFs was end-labeled and used for reverse transcription to give a 105-nt band from RPL41A transcripts and a 101-nt band from RPL41B transcripts. A primer specific for RPL30 was also labeled and included in the reverse transcription reactions as a control. It yields a 67-nt band. The gel was quantified by phosphorimaging analysis, and the L41 bands in each lane were normalized first to that of L30 and then to wild type. D, heterozygous diploid; H, haploid.
In the double knockout strain, no mRNA-encoding L41 is detected.
Translation of L41-A rapidly growing yeast cell has about 200,000 ribosomes. To accommodate a doubling time of 100 min, it must synthesize 2000 ribosomes and 2000 molecules of L41 each minute (33). Our estimate (Fig. 5) of 50 -60 mRNAs/cell encoding L41 means that each L41 mRNA is translated 30 -40 times/min. Thus, the entire translational process, from initiation through termination, must occupy no more than about 2 s. Perhaps the extended 3Ј-UTR is necessary for the short L41 mRNA to circularize during translation through interaction of poly(A) binding protein and eIF4G (25).
The structure of the exit passage of the large ribosomal subunit has been solved to 2.4 Å resolution (2). The passage is relatively narrow and tortuous, lined almost entirely with RNA helices. It is remarkable that L41 traverses this passage despite a) its high concentration of positive charge that should interact electrostatically with the walls of the passage, and b) its short length that should preclude its interaction with chaperonins at the surface of the ribosome that could assist its exit (4).