Evidence for Late Resolution of the AUX Codon Box in Evolution*

Background: Protein biosynthesis requires accurate tRNA aminoacylation. Results: Bacterial methionyl-tRNA synthetases (MRSs) vary in their ability to reject near-cognate tRNAIle transcripts containing the methionine-specifying CAU anticodon. Conclusion: Given the degree of near-cognate discrimination among bacterial MRSs, aspects of genetic code accuracy likely were fixed relatively late in evolution. Significance: This varied discrimination may reflect differing cellular needs for translational accuracy versus plasticity. Recognition strategies for tRNA aminoacylation are ancient and highly conserved, having been selected very early in the evolution of the genetic code. In most cases, the trinucleotide anticodons of tRNA are important identity determinants for aminoacylation by cognate aminoacyl-tRNA synthetases. However, a degree of ambiguity exists in the recognition of certain tRNAIle isoacceptors that are initially transcribed with the methionine-specifying CAU anticodon. In most organisms, the C34 wobble position in these tRNAIle precursors is rapidly modified to lysidine to prevent recognition by methionyl-tRNA synthetase (MRS) and production of a chimeric Met-tRNAIle that would compromise translational fidelity. In certain bacteria, however, lysidine modification is not required for MRS rejection, indicating that this recognition strategy is not universally conserved and may be relatively recent. To explore the actual distribution of lysidine-dependent tRNAIle rejection by MRS, we have investigated the ability of bacterial MRSs from different clades to differentiate cognate tRNACAUMet from near-cognate tRNACAUIle. Discrimination abilities vary greatly and appear unrelated to phylogenetic or structural features of the enzymes or sequence determinants of the tRNA. Our data indicate that tRNAIle identity elements were established late and independently in different bacterial groups. We propose that the observed variation in MRS discrimination ability reflects differences in the evolution of genetic code machineries of emerging bacterial clades.

Recognition strategies for tRNA aminoacylation are ancient and highly conserved, having been selected very early in the evolution of the genetic code. In most cases, the trinucleotide anticodons of tRNA are important identity determinants for aminoacylation by cognate aminoacyl-tRNA synthetases. However, a degree of ambiguity exists in the recognition of certain tRNA Ile isoacceptors that are initially transcribed with the methioninespecifying CAU anticodon. In most organisms, the C34 wobble position in these tRNA Ile precursors is rapidly modified to lysidine to prevent recognition by methionyl-tRNA synthetase (MRS) and production of a chimeric Met-tRNA Ile that would compromise translational fidelity. In certain bacteria, however, lysidine modification is not required for MRS rejection, indicating that this recognition strategy is not universally conserved and may be relatively recent. To explore the actual distribution of lysidine-dependent tRNA Ile rejection by MRS, we have investigated the ability of bacterial MRSs from different clades to differentiate cognate tRNA CAU Met from near-cognate tRNA CAU Ile . Discrimination abilities vary greatly and appear unrelated to phylogenetic or structural features of the enzymes or sequence determinants of the tRNA. Our data indicate that tRNA Ile identity elements were established late and independently in different bacterial groups. We propose that the observed variation in MRS discrimination ability reflects differences in the evolution of genetic code machineries of emerging bacterial clades.
Aminoacyl-tRNA synthetases catalyze the attachment of amino acids to their cognate tRNAs for protein biosynthesis at the ribosome. Accurate recognition of cognate tRNA by each aminoacyl-tRNA synthetase is essential as tRNAs charged with noncognate amino acids may be used by the ribosome to incorrectly decode codons, introducing mutations into proteins (1). In general, there are 20 aminoacyl-tRNA synthetases, one for each of the standard amino acids (2). (Individual aminoacyl-tRNA synthetases are abbreviated using the standard one-letter amino acid abbreviation.) Methionyl-tRNA synthetase (MRS) 4 attaches methionine to two distinct tRNA isoaccepting species: initiator tRNA fMet for decoding the initial AUG (start) codon and elongator tRNA Met for decoding internal AUG codons, respectively. All tRNA Met species contain a CAU anticodon. However, the single methionine codon is similar to one of the isoleucine codons (AUA). In fact, these two codons represent the only place in the genetic code where trinucleotides differing only in the type of purine in the wobble (3rd) position specify different amino acids. Solutions to this unique decoding problem vary widely across the tree of life. Higher eukaryotes use a tRNA UAU Ile whose U34 is post-transcriptionally modified to pseudouridine, allowing accurate decoding of AUA codons (3). Mitochondria sidestep the recognition problem by using a modified genetic code that assigns both AUA and AUG codons to methionine (4). Bacteria use a tRNA CAU Ile whose position 34 cytidine in the anticodon is post-transcriptionally modified by lysidine-tRNA synthase (TilS) to lysidine (L); tRNA LAU Ile then specifically decodes the AUA codon (5)(6)(7). In archaea, agmatidine is introduced into this same position instead of lysidine (8,9).
In Escherichia coli, lysidine modification switches the amino acid identity of tRNA from methionine to isoleucine and its decoding capacity from AUG methionine codons to AUA isoleucine codons (5)(6)(7). This mechanism is consistent with the prior determination that MRS from E. coli (EcMRS) belongs to an aminoacyl-tRNA synthetase group that uses cognate anticodon nucleotides as dominant identity elements for aminoacylation (10). For such synthetases, the same anticodon resi-dues are responsible both for aminoacylation identity and for decoding at the ribosome; in these enzymes, there is just a single code, the genetic code, which is functioning in translation.
The strength of this anticodon recognition element is illustrated by the determination that EcMRS can methionylate in vitro transcripts of the E. coli tRNA Ile major isoacceptor in which its GAU anticodon is changed to the methionine CAU (11). However, Mycoplasma penetrans MRS (MpMRS) rejects in vitro transcripts of M. penetrans tRNA CAU Ile lacking the lysidine 34 (L34) modification, whereas EcMRS aminoacylates the M. penetrans tRNA CAU Ile transcripts just as efficiently as its cognate tRNA Met transcripts (12). Our results therefore point out a difference in tRNA CAU Ile discrimination between the E. coli and M. penetrans methionylation systems. MpMRS utilizes residues in the acceptor stem, in particular an A3-U70 base pair, to discriminate against tRNA CAU Ile even without its lysidine modification (12).
Loosely defined, recognition of non-anticodon tRNA residues constitutes an operational code that matches amino acid with tRNA and is distinct from the genetic code (13). As our previous study compared tRNA CAU Ile discrimination between just two bacterial MRS enzymes (12), we sought to determine whether strong tRNA CAU Ile discrimination and the use of this operational code by MpMRS are the exception or the rule for bacterial MRSs. This question is of particular interest because the operational code dictated by the acceptor stem and discriminator base of the tRNA is proposed to be the ancestral recognition mechanism for early tRNA precursors (13)(14)(15).
As the isoleucyl-tRNA synthetase and MRS catalytic sites have evolved in conjunction with tRNA CAU sequences (16), there appear to have been some sequence motifs preferred by each of the synthetases. Acceptor stem nucleotides are tied with tRNA CAU Ile and tRNA CAU Met identity in bacteria; G3-C70 and A3-U70 base pairs are both commonly found in tRNA CAU Ile , whereas tRNA CAU Met and tRNA CAU fMet typically have C3-G70 base pairs (17). C4-G69 and C5-G68 also occur frequently in tRNA CAU Ile . Although these preferences are likely dictated by subtle constraints inherent in the active site of the two different enzymes, the degree of difference between tRNA CAU Ile and tRNA CAU Met varies significantly among species. E. coli and other gammaproteobacteria encode a tRNA CAU Ile with a C3-G70 base pair common also to its tRNA CAU Met and tRNA CAU fMet species (17). Thus the key 3-70 base pair used by MpMRS to reject tRNA CAU Ile is not used by EcMRS to differentiate its cognate from nearcognate tRNA CAU (12).
Variation in the zinc binding domain among MRS enzymes might account for tRNA CAU Ile discrimination differences as this domain is proposed to play a role in acceptor stem recognition (18,19). The zinc binding domain forms part of the connective polypeptide linking the halves of the Rossmann fold; depending on the organism, one or two small "knuckle" structures are present (20,21). Strikingly, MRSs of bacterial origin have only a single knuckle, whereas MRSs of the archaeal clade have two (19,22). Each knuckle that binds zinc typically consists of four cysteine residues in two CXXC motifs (18), although in some enzymes, the knuckle structure is generated without cysteine residues or coordination of zinc. Thus bacterial MRSs can first be divided as bacterial or archaeal in origin, and then they can be further delineated into two additional classes depending on whether they are predicted to bind one or no zinc ions (in the bacterial clade) or one or two zinc ions (in the archaeal clade) (19,22).
In this work, we investigate the distribution of tRNA CAU Ile discrimination in bacterial methionylation systems. We selected seven systems from different bacterial clades to explore a variety of phylogenetic and structural characteristics. We considered MRS enzymes of eubacterial origin as well as those thought to result from archaeal gene transfer (22). These species furthermore possess a diverse set of tRNA CAU acceptor stems. The seven enzymes also represent each of the four structural classes of MRS with respect to zinc occupancy in the connective polypeptide domain (19). Finally, the species vary with regard to the type of TilS (tRNA Ile lysidine synthetase) enzyme they encode: either the longer Type I TilS with an appended tRNA binding domain or the shorter Type II TilS (23,24). We observe in this set of MRS enzymes a stark variation in levels of tRNA CAU Ile discrimination with little correlation to the structural and phylogenetic features described above. Our conclusions support the hypothesis that tRNA CAU discrimination was established recently with respect to extant bacterial clades. Transfer RNA Substrate Preparation-Transfer RNA CAU Ile and tRNA CAU Met sequences were obtained from the Lowe tRNA database (25). RNAs were generated by in vitro transcription of overlapping oligonucleotides (26) using T7 RNA polymerase, 5 mM NTP, 40 mM DTT, 250 mM HEPES⅐KOH (pH 7.5), 30 mM MgCl 2 , 2 mM spermidine, and 0.1 mg/ml bovine serum albumin at 37°C for 4 h. Transcripts were separated on denaturing PAGE, eluted from the gel using an Elutrap electroelution apparatus (Schleicher & Schuell), and refolded (80°C followed by gradual reduction of temperature in the presence of 1 mM MgCl 2 ). Aminoacylation plateaus were used to calculate the concentration of active molecules for each tRNA preparation using 10 M of the corresponding native enzyme (for tRNA Met samples) or EcMRS (for tRNA CAU Ile samples). The fraction of aminoacylatable tRNA was consistent with E. coli tRNA Met transcripts we typically use based on A 260 quantification of tRNA (30 -65% active).
Aminoacylation Thus for these enzymes, methionylation of tRNA CAU Met was compared with tRNA CAU Ile by using higher concentrations of both enzyme and tRNA CAU Ile as described previously (12).
Phylogenetic Analysis-MRS protein sequences were aligned with T-Coffee using standard parameters, and gaps and ambiguously aligned regions were removed (29). Distance trees were calculated with the package PHYLIP, using the programs PROTDIST and NEIGHBOR, and using 1000 bootstraps to estimate robustness of the nodes (30). The overall topology of the tree was confirmed using maximum parsimony using the program PROTPARS in the PHYLIP package and by maximum likelihood using PHYML (31,32).

Selection of MRS Enzymes-Prior phylogenetic analysis has
shown that bacterial MRSs belong to two distinct clades (17). Although many bacterial MRSs appear to be of direct bacterial origin, E. coli and some other species have MRS enzymes apparently resulting from a horizontal gene transfer from archaea. Interestingly, extant archaeal MRS enzymes discriminate against a different C34 modification (agmatidine) than bacterial MRSs (lysidine). On the other hand, eukaryotic cytoplasmic MRSs (arising from archaea) do not require tRNA CAU Ile discrimination as eukaryotes do not encode a tRNA CAU Ile (25). We therefore wondered whether the robust ability of EcMRS to aminoacylate tRNA CAU Ile (12) might also be observed in other bacterial MRS enzymes of archaeal origin because archaea use a different AUA/AUG decoding strategy. Although previous phylogenetic analyses placed numerous mycoplasma MRSs in the bacterial clade, they did not analyze the position of MpMRS (22). We performed our own phylogenetic analysis of MRS sequences, including those of enzymes assayed in this work. The overall tree architecture was consistent with the previously published work (22). MpMRS was placed in the bacterial clade (Fig. 1), and all the other species tested in this work correctly clustered with species from different families within their same bacterial phylum.
We thus selected three MRS examples from the bacterial clade and two from the archaeal clade to expand our initial observations on EcMRS (archaeal clade) and MpMRS (bacterial clade). The selected enzymes also represent the four different zinc binding domain classes (Fig. 2). From the bacterial clade, the examined species include MRS from the proteobacterium H. pylori (HpMRS), the GC-rich M. smegmatis (MsMRS), the firmicute S. pneumoniae (SpMRS1), and the opportunistic pathogen M. penetrans (MpMRS). From the archaeal clade, we selected MRS from E. coli (EcMRS), the spirochete B. burgdorferi (BbMRS), and the obligate anaerobe B. fragilis (BfMRS).
The genes for these enzymes were cloned, overexpressed in E. coli, and purified. The corresponding tRNA CAU Met and tRNA CAU Ile (supplemental Fig. S2)  aminoacylation initial rate was calculated for each MRS as a measure of discrimination efficiency. Enzymes displaying greater than 1000-fold difference in discrimination (as previously seen for MpMRS) were termed strong discriminators, whereas enzymes showing greater than 10-fold but less than 1000-fold difference were termed moderate discriminators. HpMRS, MsMRS, and SpMRS showed moderate discrimination, charging tRNA CAU Met more efficiently than tRNA CAU Ile by 20-, 40-, and 90-fold, respectively ( Table 1). The aminoacylation profile of HpMRS provides an example of the moderately discriminating enzymes with methionylation of tRNA CAU Ile observed using nanomolar enzyme concentration. This aminoacylation, whereas modest, is clearly above background and reflects multiple turnover of enzyme (Fig. 3). These results are consistent with tRNA CAU Ile discrimination being widely distributed throughout the bacterial clade.
Among the enzymes tested, all bacterial-type MRSs could be classified as discriminating. Representative MRSs from major taxons including firmicutes and proteobacteria were able to reject near-cognate tRNA CAU Ile . The extent of discrimination varies widely, however, as the strongly discriminating MpMRS also falls within the bacterial clade. The zinc binding domain class does not correlate with the level of discrimination. MpMRS and HpMRS both have domains with one knuckle and one zinc ion (Fig. 2), but MpMRS is a strongly discriminating MRS with a 1600-fold difference in charging tRNA CAU Met over tRNA CAU Ile , whereas HpMRS shows only a moderate 20-fold level of discrimination.
Similarly, although the A3-U70 base pair of M. penetrans tRNA CAU Ile is a key identity element for discrimination by

Archaeal Clade MRSs Display Great Diversity in Near-cognate Discrimination-
The archaeal clade MRSs tested exhibit a striking diversity in their ability to discriminate tRNA CAU Ile . Initial assays with BfMRS showed no aminoacylation of tRNA CAU Ile at enzyme concentrations that catalyzed efficient aminoacylation of tRNA CAU Met . As in previous work with MpMRS (12), we estimated the degree of discrimination using a higher concentration of enzyme and tRNA CAU Ile when compared with that used for the tRNA CAU Met sample (Fig. 4). Aminoacylation of tRNA CAU Ile was observed under these conditions, although multiple turnover of the enzyme was not achieved, and we estimate from initial rates of aminoacylation that BfMRS shows a 2000-fold level of discrimination. The strong discrimination observed for BfMRS is even higher than that seen for the bacterial clade MpMRS and clearly distinct from the moderate 60-fold discrimination seen for the archaeal clade EcMRS. This was an unexpected result as BfMRS has the same two knuckle, one zinc ion configuration as EcMRS. Examination of the zinc binding domains does not suggest significant differences in the motifs or flanking sequences (Fig. 2).
Strikingly, the archaeal-type BbMRS exhibited only a 2-fold difference in aminoacylation efficiency between its cognate and near-cognate transcripts (Fig. 5). BbMRS is the only example of a nondiscriminating MRS identified in this work. This observation raises the critical question: How does B. burgdorferi prevent misinsertion of methionine at isoleucine codons?
Despite does have an unusual U6-U67 mismatch in its acceptor stem, but the significance of this pair is unclear. The evolutionary relationship of Bacteroides to other bacteria is somewhat ambiguous, and Bacteroides species comprise their own taxon. BfMRS clusters with the MRS of Cytophaga hutchinsonii and also some archaea such as P. abyssi but is phylogenetically distant from BbMRS and EcMRS and their associated clusters (22) (Fig. 1). It is unclear whether strong tRNA CAU Ile discrimination among the MRS of the archaeal clade is restricted to the Bacteroides taxon or whether it is more widely distributed. Clearly, bacterial or archaeal ancestry alone does not determine tRNA CAU Ile discrimination ability.

DISCUSSION
The results presented in this work indicate that the extent of tRNA CAU Ile discrimination by MRS varies widely in bacteria and is not defined by phylogenetic position or by the structural type of the zinc binding domain. Although we showed earlier that the 3-70 base pair is an important element for MpMRS tRNA CAU Ile rejection, its presence or absence is not an accurate predictor of discrimination across bacteria.
Each of the bacterial clade MRSs tested to date aminoacylates its cognate tRNA CAU Met at least 20-fold better than its corresponding near-cognate tRNA CAU Ile . In contrast, an example of nondiscrimination exists in the archaeal-type BbMRS. BbMRS clusters phylogenetically with eukaryotic MRSs that also have two  zinc ions and do not require the same type of near-cognate discrimination as eukaryotes lack tRNA CAU Ile . However, a single case of nondiscrimination is not sufficient to define a specific relationship between the zinc binding domain class and discrimination. Additional bacterial enzymes with two zinc ions should be assayed to see whether there is indeed a correlation. Furthermore, characterization of the full set of identity elements used by BbMRS is a priority as its cognate tRNA Met acceptor stem contains the unusual U3-A70 pair.
We have also examined the genetic background and biochemical environment of each species for putative selective pressures for robust tRNA CAU Ile discrimination. The need for acceptor stem-based discrimination in MpMRS could be tied to the high AT bias of the M. penetrans genome, one of the highest of any bacteria (33). The AUA isoleucine codon usage is only 0.4% for the E. coli genome but 2.1% for the M. penetrans genome (34). Increased usage of the AUA codon could potentially make M. penetrans more sensitive to occasional misreading of AUA codons by unmodified Met-tRNA CAU Ile , thus driving the need for a stronger MpMRS tRNA CAU Ile discrimination mechanism than required by E. coli. However, this hypothesis is not supported by the analysis of the five additional species tested here. The low GC content B. burgdorferi genome uses AUA codons at almost double the frequency of M. penetrans, but BbMRS does not discriminate (Table 1) (Table 1). Specifically, B. fragilis possesses a Type I TilS with the additional domain, but BfMRS strongly discriminates tRNA CAU Ile . Although the CAU anticodon has long been considered the dominant identity element for aminoacylation by MRS, we show that the presence of this single methionine-specifying anticodon is not sufficient to confer aminoacylation in all bacterial enzymes. Of the enzymes tested, only the archaeal clade BbMRS could be classified as nondiscriminating, with only 2-fold discrimination against tRNA CAU Ile . Even the well studied EcMRS aminoacylates its cognate tRNA 60-fold more efficiently than its near-cognate tRNA CAU Ile . Certainly, the in vitro conditions used here only begin to test the ability of organisms to prevent misacylation. In addition to bacterial modification of the wobble nucleotide to lysidine by TilS, other modifications not present in the transcripts used here, codon usage, and the relative concentrations of methionine versus isoleucine and tRNA Met versus tRNA CAU Ile are likely to influence translational accuracy at the level of tRNA aminoacylation. Adaptations to the decoding machinery may also enhance accuracy. For example, Mycoplasma mobile is one of several bacteria that lacks TilS; AUA codons are read by tRNA UAU Ile , and wobble pairing to Met AUG codons is not observed (36). This suggests that M. mobile ribosomes have adapted to ensure accuracy through a mechanism independent of known anticodon base modifications.
However, despite the numerous means by which genetic fidelity is maintained, translational accuracy may be more plastic than previously thought. Misincorporation of amino acids into the proteome may provide a cellular advantage, particularly during conditions of oxidative or other growth stress (37). In particular, methionine is attached to noncognate tRNAs at about 1% the level of tRNA Met aminoacylation in human (HeLa) cells, and misacylation increases up to 10-fold upon viral infection or exposure to reactive oxygen species-inducing agents (38). The mediator of this misacylation in human cells is MRS, and a wide range of noncognate tRNAs is mischarged. Similarly, EcMRS exhibits in vitro mischarging of the near-cognate tRNA CGU Thr and tRNA CCU Arg in an anticodon-dependent fashion; substitutions either to the tRNA CNU anticodons or to the EcMRS anticodon binding domain decrease misacylation (39).
The rules governing rejection of near-cognate tRNA are clearly not universal and may reflect a relatively late evolution of discrimination strategies. In this regard, several options appear possible. For example, the final assignment of AUG codons to methionine could be, in itself, a late decision in the development of the code. This seems unlikely given the almost universal utilization of AUG as an initiator codon translated by methionine. Alternatively, a shift in the distribution of isoleucine and methionine codons may have taken place later in evolution. In this case, the current situation in mitochondria could represent the ancestral state, which, due to a universal pressure, was independently resolved into the extant structure of the AUX codon box in different clades. This possibility would explain why the strategies to ensure proper tRNA CAU Ile discrimination vary so widely among extant species. Alternatively or additionally, the varying degree of cognate versus near-cognate discrimination displayed by bacterial MRSs may indicate species-specific requirements for plasticity of translational fidelity and relative adaptive ability that might be enabled by low level misacylation.