Mechanism of premature translation termination on a sense codon

Accurate translation termination by release factors (RFs) is critical for the integrity of cellular proteomes. Premature termination on sense codons, for example, results in truncated proteins, whose accumulation could be detrimental to the cell. Nevertheless, some sense codons are prone to triggering premature termination, but the structural basis for this is unclear. To investigate premature termination, we determined a cryo-EM structure of the Escherichia coli 70S ribosome bound with RF1 in response to a UAU (Tyr) sense codon. The structure reveals that RF1 recognizes a UAU codon similarly to a UAG stop codon, suggesting that sense codons induce premature termination because they structurally mimic a stop codon. Hydrophobic interaction between the nucleobase of U3 (the third position of the UAU codon) and conserved Ile-196 in RF1 is important for misreading the UAU codon. Analyses of RNA binding in ribonucleoprotein complexes or by amino acids reveal that Ile–U packing is a frequent protein–RNA-binding motif with key functional implications. We discuss parallels with eukaryotic translation termination by the release factor eRF1.

Accurate translation termination by release factors (RFs) is critical for the integrity of cellular proteomes. Premature termination on sense codons, for example, results in truncated proteins, whose accumulation could be detrimental to the cell. Nevertheless, some sense codons are prone to triggering premature termination, but the structural basis for this is unclear. To investigate premature termination, we determined a cryo-EM structure of the Escherichia coli 70S ribosome bound with RF1 in response to a UAU (Tyr) sense codon. The structure reveals that RF1 recognizes a UAU codon similarly to a UAG stop codon, suggesting that sense codons induce premature termination because they structurally mimic a stop codon. Hydrophobic interaction between the nucleobase of U3 (the third position of the UAU codon) and conserved Ile-196 in RF1 is important for misreading the UAU codon. Analyses of RNA binding in ribonucleoprotein complexes or by amino acids reveal that Ile-U packing is a frequent protein-RNA-binding motif with key functional implications. We discuss parallels with eukaryotic translation termination by the release factor eRF1.
Translation termination defines the lengths of all cellular proteins. Three stop codons-UAA, UAG, and UGA-signal the end of the mRNA open reading frame (ORF). A stop codon in the ribosomal A site is recognized by a bifunctional protein called a release factor (RF), 2 which (i) recognizes stop codons and discriminates against sense codons, and (ii) catalyzes peptidyl-tRNA hydrolysis, releasing the peptide from the ribosome. Bacteria express two release factors: RF1 recognizes UAA/UAG codons, and RF2 recognizes UAA/UGA codons. A single eukaryotic release factor eRF1 recognizes all three stop codons.
Accurate termination on stop codons is crucial for the cell and organism. Premature termination, either on sense codons or on premature stop codons arising from nonsense mutations, would result in accumulation of truncated proteins with compromised or toxic activities. Many genetic diseases are caused by premature stop codons (1,2), highlighting the deleterious effects of premature termination. In an intact ORF (i.e. no premature stop codon), however, near-stop codons may cause premature termination (3). Sense codons differing from the stop codon in the third-nucleotide (wobble) position are most promiscuous (4). A study measuring the activity of bacterial release factors on near-stop codons identified the UAU sense codon as a "hot spot" for RF1 (4). We recently confirmed these findings (5), showing that RF1 can catalyze release on a UAU sense codon almost as efficiently as on a UAA codon when RF1 is in excess (Fig. 1A). Affinity of release factors to some sense codons does not lead to predominant premature termination in vivo due to efficient decoding of sense codons by aminoacyl-tRNAs, but it is likely responsible for background levels of premature termination (3).
The structural mechanism of stop-codon recognition was elucidated by high-resolution crystal structures (6 -9) and further investigated by molecular dynamics simulations (10). Release factors strongly discriminate against purines in the first position, so that no product can be detected when the ribosome encounters sense codons with A1 or G1 (i.e. A or G in position 1 of the A-site codon; Fig. 1A) (4,11,12). The structures showed that the Watson-Crick side of the first nucleotide faces the backbone of an ␣-helix of RF1 or RF2, so the strict requirement for pyrimidine in the first position is explained by base recognition by a rigid structural element of release factor. The second and third positions are limited to purines and interact with side chains of RF1 or RF2 (13-17). The third nucleotide is also sandwiched between the conserved 16S rRNA nucleotide G530 and a side chain of release factor: Ile-196 of RF1 or Arg-218 of RF2 (Escherichia coli numbering, unless noted otherwise). The Hoogsteen side of the third nucleotide (at atoms N6/O6 and N7) is stabilized by a conserved threonine (Thr-194 in RF1 or Thr-216 in RF2). In the eukaryotic 80S termination complex formed with eRF1, the third position of the stop codon similarly packs between a purine and Ile-62 (Homo sapiens) of the essential NIKS motif of eRF1 (18), and it interacts with Thr-58 (19,20).
In this work, we asked how a UAU sense codon is misread by RF1 as a stop codon, and why U in the third position is preferred over C, rendering UAU a more efficient "mis-terminator" than UAC. We determined a cryo-EM structure of the bacterial 70S ribosome complex that helps answer these questions. Our analyses suggest that the packing of U3 against Ile-196 of RF1 is critical for the preference of U over C. We find that the Ile-U packing is a prevalent motif in the ribosome and other protein-RNA structures, consistent with the role of Ile and other nonaromatic hydrophobic side chains in specific recognition of uridine.

Mechanism of termination on the UAU sense codon
We determined a cryo-EM structure of the E. coli 70S⅐RF1 complex formed in response to the UAU codon in the A-site at 3.7 Å average resolution ( Fig. 1B and Table 1), with local resolution achieving ϳ3 Å in ribosomal functional centers (Fig. 1C), allowing near-atomic interpretation (Fig. 1, D and E). The overall conformations of release factor and the UAU codon are similar to those in canonical termination complexes formed with stop codons. We note one main difference between our structure and previous high-resolution 70S⅐RF1 structures from Thermus thermophilus and heterologous systems. In our E. coli structure, domain 1 of RF1 binds both the large-subunit ribosomal protein L11 and 23S rRNA of the L11 stalk, consistent with observations at lower resolutions for E. coli 70S⅐RF1 complexes formed on a stop codon (21). Specifically, hydrophobic Ile-29 of RF1 docks at the proline-rich region of L11 (aa Pro-21 to Pro-25), and Phe-35 binds near A1095 of 23S rRNA, similar to that seen for RF2 (6,8,22,23). In other high-resolution structures, however, domain 1 of RF1 is either unresolved (11, 24 -26) or binds near the L11 stalk without contacting it (7,9,23). In our structure, domain 1 is not as well resolved as other domains of RF1 (Fig. 1C), consistent with the idea that it is dynamic. The different positions of domain 1 in different structures highlight that interactions between domain 1 and the ribosome are dynamic and are likely important at early stages of RF1 binding or during RF1 dissociation, in keeping with functional interaction between RF1 and L11 (27)(28)(29).
Density in the peptidyl transferase and decoding centers ( Fig.  1, D and E) showed interactions similar to those found in canonical termination complexes with RF1 (7,9). Consistent with the catalytic activity of RF1 on the UAU codon, the cata- . B, cryo-EM structure of the 70S⅐RF1 complex formed on the UAU sense codon. FSC curve is shown for the 70S⅐RF1 cryo-EM map (lower left). C, local resolution of RF1 in the cryo-EM map, determined using Blocres (76). RF1 is oriented similarly to the view shown in B. The map was sharpened by applying a B-factor of Ϫ100 Å 2 and is shown at 2, colored using a resolution scale ranging from 2.8 to 5.3 Å (left). D, cryo-EM map (mesh) in the peptidyl transferase center (PTC). E, cryo-EM map (mesh) in the decoding center (DC). In structural models, the large 50S ribosomal subunit is shown in cyan, small 30S subunit in yellow, RF1 in green, mRNA in dark blue, P-site tRNA in orange, and E-site tRNA in pink. Domains of RF1 are labeled in B and C.

ASBMB Award Article: Translation termination on sense codon
lytic 233 GGQ 235 loop is positioned next to the terminal nucleotide A76 of the P-site tRNA and is stabilized by interactions with A2602, which is essential for termination ( Fig. 1D) (30 -32). In the decoding center, the first two nucleotides of the UAU codon interact with the codon-recognition residues of RF1 similarly to stop-codon nucleotides. Unlike a UAU codon, a UAC sense codon inefficiently triggers peptide release, similar to sense codons that differ from the stop codon at their second position, e.g. UCA or UGG (4). This might appear surprising because the hydrogen-bonding valences of the amino group N4 of cytosine, placed similarly to the carbonyl oxygen O4 of uracil, could be satisfied by interactions with Thr-196 (its OH group becoming the H-bond acceptor) and the amide group of Gln-185 (9) or with an ordered water molecule (10). In fact, just as U3 in UAU mimics a G3 in the UAG stop codon, a C3 in UAC could mimic A3 in the UAA stop codon. We propose that the difference in hydrophobicity between the cytidine and uridine makes the termination on UAC less efficient than on UAU. Because C is substantially less hydrophobic than U (37)(38)(39), packing of C between Ile-169 and G530 would be less energetically favorable than the packing of U. Base-stacking energy of cytosine on guanosine is similar to that of uracil on guanosine (33)(34)(35)(36), suggesting that the major discrimination between U and C in position 3 results from favorable hydrophobic packing of Ile-169 on U rather than on the less hydrophobic C.
In summary, our structure shows that recognition of the sense codon UAU by RF1 is similar to that of stop codon UAG. This suggests that other hot-spot sense codons likely undergo similar conformational rearrangements and are recognized by release factors similarly to stop codons, with which they have partial stereochemical resemblance. Structural analysis points at Ile-U packing being critical for making UAU a hot spot for mis-termination by RF1.

Ile-U interactions in protein⅐RNA complexes
The role of Ile-U packing in mis-termination by RF1 prompted us to investigate whether Ile-U is a common interaction employed in protein-RNA recognition. Nucleotide stacking is the major stabilizing interaction in secondary and tertiary structures of nucleic acids. In protein⅐RNA complexes, the energy of unstacking a nucleotide from its stacking partner(s) is usually compensated by interaction between the aromatic base of the nucleotide and protein side chain(s). The bestcharacterized interactions include stacking of nucleotides on aromatic side chains and on positively charged side chains (Ref. 40 and references therein). Because stacking interactions involve a hydrophobic energy contribution (nonpolar, nonelectrostatic, and solvent entropy) (34,41,42), nucleotides also stack on aliphatic hydrophobic side chains. Isoleucine is the most hydrophobic side chain, according to many hydrophobicity scales (43)(44)(45)(46), and the uracil and adenine nucleobases are more hydrophobic than cytosine and guanine (37)(38)(39). Thus, uridine and adenosine are more likely than cytidine and guanosine to interact with isoleucine and other aliphatic side chains. Indeed, in computational simulations, U was the only nucleotide with a favorable free energy of binding to Ile (i.e. negative ⌬⌬G) in methanol, which is thought to represent the environment for nucleic acid-protein interfaces more accurately than water (47).
To test whether packing of isoleucine on uracil is among the preferred protein-RNA interactions, we calculated the number of stacking interactions between RNA nucleotides and aliphatic, aromatic, or charged protein side chains in high-resolution crystal structures of ribosomes, including the 2.4-Å resolution E. coli 70S ribosome (48), the 2.5-Å resolution T. thermophilus 70S ribosome (49), and the 3.0-Å resolution S. cerevisiae 80S ribosome (50). Collectively, these structures provide a large pool of protein-RNA interactions comprising ϳ29,000 amino acids and ϳ19,000 nucleotides. We normalized each type of stacking to the number of nucleotides of each type (i.e. number of amino acids per thousand nucleotides).
These data show that Ile does indeed prefer to pack on uridine (Fig. 3). Ile-A packing is also well represented, whereas Ile packed the least on less hydrophobic C and G nucleotides. As expected, most of packing interactions of nucleotides occur with the positively charged Arg. Aliphatic, aromatic, and posi-  (9)). Thr-198 forms one of two possible hydrogen bonds with G3 (shown with the dashed and dotted lines). C, interactions of Oryctolagus cuniculus eRF1 with G3 of the UAG stop codon in the 80S ribosome (PDB code 3JAH (19)).

ASBMB Award Article: Translation termination on sense codon
tively charged side chains pack more frequently on U or A (collectively Ͼ50%) than on the less hydrophobic C or G. This preference is notable for Ile (72%), Pro (85%), Phe (70%), and Tyr (65%). Negatively charged Asp and Glu are the least represented among protein side chains packing on aromatic bases, as expected. Interestingly, however, 70S rescue complexes formed with ArfA and RF2 on truncated mRNAs employ the packing of Glu-30 of ArfA on G530 of 16S rRNA (51-55), suggesting functional roles for this underrepresented group of interactions. The carboxyl group of Glu-30 is stabilized by interactions with the side chain of Arg-213 of RF2 and with 2Ј-OH of G530.
The specific affinity between U and Ile is also emphasized by amino acid-RNA affinity selection experiments, in which isoleucine selectively bound to UAU-containing RNA motifs (56,57). This specificity plays key functional roles in protein⅐RNA complexes. For example, isoleucine-U interaction is critical for specific and efficient recognition of bovine immunodeficiency virus transactivation-response element by the Tat protein (58). Crystal structure revealed that the side chain of Ile-79 of Tat packs on the aromatic ring of U10 and stabilizes the U10 -A13: U24 base triplet (59). In Drosophila, Sxl regulates alternative splicing by specific recognition of a U-rich sequence in pre-mRNA, involving a uridine sandwiched between two isoleucines (Ile-U-Ile) (60). In archaeal RNase P, protein Rpp38 provides Ile-63 to stabilize bulged U19 of the enzyme's RNA (61,62).

Implications for eukaryotic termination
Structural and biochemical work has yielded detailed insights into the mechanism of eukaryotic termination, but the mechanism of premature termination on sense codons in eukaryotes remains poorly understood. The UGG codon binds eRF1 (63)(64)(65), but termination activity was not detected in vitro (66). The sequences and structures of codon-recognition domains of eukaryotic and bacterial release factors are very different. Therefore, perhaps not surprisingly, structural recognition of stop codons by eukaryotic eRF1 also differs from that by bacterial release factors. For example, eRF1 recognizes the U-turn-like geometry of the stop codon (19,20,67) and a nucleotide downstream of the stop codon (68). However, recognition of the third nucleotide by eRF1 is remarkably similar to that by RF1. In the structures of 80S⅐eRF1 complexes (19,20,67), the third nucleotide is sandwiched between a purine (second base of stop codon) and Ile-62 of the universally conserved NIKS motif of eRF1 (Fig. 2C). Furthermore, the O6 atom of the guanosine of the UAG stop codon (19) hydrogen bonds with Thr-58. The similar structural environment of the third nucleotide in the eukaryotic and bacterial complexes suggests that UAU might also be a hot spot for termination in eukaryotes. Premature termination on UAU, however, is likely less pronounced than on bacterial ribosomes, due to stringent codon discrimination facilitated by GTPase eRF3 (69,70).

Preparation of the 70S⅐mRNA(UAU)⅐tRNA fMet ⅐RF1 complex
C-terminally His-tagged E. coli RF1 was purified as described previously (5). 70S ribosomes were prepared from E. coli (MRE600), as described previously (5), and stored in the ribosome-storage buffer (20 mM Tris-HCl (pH 7.0), 100 mM NH 4 Cl, 12.5 mM MgCl 2 , 0.5 mM EDTA, 6 mM ␤ME) at Ϫ80°C. Ribosomal 30S and 50S subunits were purified using sucrose gradient (10 -35%) in a ribosome-dissociation buffer (20 mM Tris-HCl (pH 7.0), 300 mM NH 4 Cl, 1.5 mM MgCl 2 , 0.5 mM EDTA, 6 mM ␤ME). The fractions containing 30S and 50S subunits were collected separately, concentrated, and stored in the ribosome-storage buffer at Ϫ80°C. E. coli tRNA fMet was purchased from Chemical Block. RNA, containing the Shine-Dalgarno sequence and a linker to place the AUG codon in the P-site and the UAU codon in the A-site (GGC AAG GAG GUA AAA AUG UAU AAAAAA) was synthesized by IDT.
The 70S⅐mRNA⅐tRNA fMet ⅐RF1 complex was prepared by reconstitution in vitro. 1 M 30S subunit (all concentrations are specified for the final solution) were pre-activated at 42°C for 5

ASBMB Award Article: Translation termination on sense codon
min in the ribosome-reconstitution buffer (20 mM Tris-HCl (pH 7.0), 100 mM NH 4 Cl, 20.5 mM MgCl 2 , 0.5 mM EDTA, 6 mM ␤ME). After pre-activation, 0.9 M 50S subunit with 12 M mRNA and 5 M tRNA fMet were added to the 30S solution and incubated for 15 min at 37°C. An equal volume of 40 M RF1 was then added resulting in the following final concentrations: ϳ0.45 M 70S, 6 M mRNA, 2.5 M tRNA fMet , and 20 M RF1. The solution was incubated for 15 min at 37°C and applied on cryo-EM grids at room temperature.

Cryo-EM and image processing
Holey-carbon grids (C-flat 1.2-1.3, Protochips) were glow-discharged with 20 mA with negative polarity for 30 s in a PELCO easiGlow glow-discharge unit. 1.5 l of the 70S⅐mRNA⅐tRNA fMet ⅐RF1 complex was applied to the grids. The grids were blotted for 3.5 s at blotting power 8 at 4°C and ϳ95% humidity and plunged into liquid ethane, using an FEI Vitrobot MK4. The grids were stored in liquid nitrogen.
A dataset of 1,065,147 particles was collected as follows. 3963 movies were collected using SerialEM (71) on a Talos Arctica (FEI) microscope operating at 200 kV equipped with a K2 Summit camera system (Gatan) with Ϫ0.7to Ϫ1.7-m defocus. Each exposure was acquired with continuous frame streaming with the exposure length of 80 frames per movie yielding a total dose of 37.7 e Ϫ /Å 2 . The nominal magnification was 22,000, and the corrected super-resolution pixel size at the specimen level was 0.944 Å. The frames for each movie were processed using IMOD (72). The movies were motion-corrected, and frame averages were calculated using frames 3-42 within each movie, using alignframes (IMOD), after multiplication with the corresponding gain reference. cisTEM (73) was used to determine defocus values for each resulting frame average and for particle picking. The stack and FREALIGN parameter file were assembled in via cisTEM with the binning of 1, 3ϫ, and 6ϫ (box size of 480 for a nonbinned stack).
Data processing was performed essentially as described previously (74). FrealignX version 9.11 in FrealignX mode was used for all steps of refinement and reconstruction (75). The 6ϫ-binned image stack (1,065,147 particles) was initially aligned to a ribosome reference (PDB code 5J4D (11)) without RF1 and E-tRNA, using three cycles of mode 3 (global search) alignment, including data in the resolution range from 30 to 300 Å. Subsequently, the 6ϫ binned stack was refined using mode 1 (refine) in the resolution ranges (sequentially): 30 -300, 24 -300, 18 -300, and 15-300 Å (three cycles for each range). Using the 3ϫ binned image stack, the particles were successively aligned in mode 1 (refine) by gradually increasing the high resolution limit to 12,10,9,8, and 7 Å (three cycles for each resolution limit). In the last step, the unbinned (full-resolution) image stack was used to successively align particles against the common reference using mode 1 (refine; three cycles) at the resolution limit of 6 Å. 3D density reconstruction was obtained using 60% of particles with highest scores. The map contained density for the P-and E-site tRNAs, mRNA and RF1. The resolution of the resulting reconstruction was ϳ3.7 Å (Fourier shell correlation (FSC) ϭ 0.143); local resolution for the codonrecognition domain of RF1 in the decoding center and the catalytic domain in the peptidyl transferase center achieves ϳ3 Å resolution, allowing near-atomic resolution interpretation of nucleotide and side-chain interactions (Fig. 1, C-E). Additional classification into 16 classes yielded three classes (87% of all particles) with the occupancy of RF1 similar to that in the initial map. The initial reconstruction was B-factor sharpened using B-factors of Ϫ150, Ϫ200, and Ϫ225 Å 2 in bfactor.exe (part of the FrealignX distribution) and used for model building and structural refinements. The B-factor of Ϫ100 Å 2 was also used to visualize lower-resolution details. FSC curve was calculated by FrealignX for even and odd particle half-sets (Fig. 1B). Blocres was used to assess local resolution of the unfiltered and unmasked volume using a box size of 56 pixels, step size of 10 pixels, and resolution criterion of FSC value at 0.143 (76).

Model building and refinement
Recently reported cryo-EM structure of E. coli 70S⅐ArfA⅐RF2 complex (55), excluding ArfA and RF2, was used as a starting model for structural refinement. The initial model of E. coli RF1 (domains 2-4) was extracted from the crystal structure of the 70S⅐RF1 complex (11), and domain 1 was obtained by homology modeling from T. thermophilus RF1 (9) using SWISS-PROT (77). Initial protein and ribosome domain fitting into cryo-EM maps was performed using Chimera (78), followed by manual modeling using PyMOL (79). The linker between domain 1 and domain 2 (aa 99 -105) was not defined in the cryo-EM map and was not modeled.
The structural model was refined by real-space simulatedannealing refinement using atomic electron scattering factors in RSRef (80,81), as described previously (82). Secondarystructure restraints, comprising hydrogen-bonding restraints for ribosomal proteins and base-pairing (distance and co-planarity) restraints for RNA nucleotides, were implemented in CNS format (83). Refinement parameters in RSRef, such as the relative weighting of stereochemical restraints and experimental energy term, were optimized to produce the stereochemically optimal models that closely agree with the corresponding maps. Refinement was performed using B-factor sharpened maps: all-atom refinement (Ϫ200 Å 2 ), then local refinement of P-tRNA, mRNA, RF1, and neighboring residues (Ϫ225 Å 2 ) at the starting annealing temperature 1000 K. In the final stages, the structures were refined using phenix.real_space_refine (84) against a B-sharpened map (Ϫ150 Å 2 ) at 300 K, followed by a round of refinement in RSRef applying harmonic restraints to preserve protein backbone geometry (Ϫ150 Å 2 and 300 K). The refined structural model closely agrees with the corresponding maps, as indicated by the low real-space R-factor of ϳ0.19 (RSRef) and high correlation coefficient of 0.86 (PHENIX, CC around atoms). The resulting models have good stereochemical parameters, characterized by low deviation from ideal bond lengths and angles, low number of protein-backbone and rotamer outliers, as shown in Table 1. Structure superpositions and comparisons were performed in PyMOL.

Structural analyses of protein side chain interactions with RNA nucleotides
The following ribosome structures were downloaded from RCSB for the analyses of amino acid-nucleotide interactions: E. coli 70S ribosome (PDB code 4YBB (48)), T. thermophilus ASBMB Award Article: Translation termination on sense codon 70S ribosome (PDB code 4Y4P (49), and S. cerevisiae 80S ribosome (PDB code 4V88 (50). PyMOL was used to calculate the number of side chains packed on RNA nucleotides (number of aa per 1000 nucleotides, Fig. 3). The following distance cutoff criterion was used: at least one side-chain nonhydrogen atom (i.e. any atom excluding the backbone atoms) within 3.7 Å from the following carbon atoms of the aromatic base of a nucleotide: C2, C4, or C5 (U or C); C4, C5, or C6 (A); and C2, C4, C5, or C6 (G). Selected amino acids were confirmed by visual inspection of the PDB structures in PyMOL, supporting the stringency of the selection criterion.

Structure accession codes
The cryo-EM map and PDB coordinates have been deposited in EMDB and the Protein Data Bank with accession codes EMD-7970 and 6DNC.