Operational RNA Code for Amino Acids in Relation to Genetic Code in Evolution*

The genetic code is based on the specific aminoacylation of transfer RNAs (tRNAs) by aminoacyl-tRNA synthetases (aaRS) (1, 2). This reaction links anticodon triplets in tRNAs with specific amino acids. The specificity of the reaction is governed by tRNA identity elements that are recognized by the aminoacylating enzymes (2). The universal distribution and conservation of tRNAs and aaRS imply that they preceded the origin of the three kingdoms of life, Bacteria, Archae, and Eucarya (3–5). Significantly, nucleotide determinants other than the anticodon triplets are important for aminoacylation efficiency and specificity (6, 7). It is these nucleotides (making up an operational RNA code) that are now seen as important for maintaining a universal genetic code.

The genetic code is based on the specific aminoacylation of transfer RNAs (tRNAs) by aminoacyl-tRNA synthetases (aaRS) 1 (1,2). This reaction links anticodon triplets in tRNAs with specific amino acids. The specificity of the reaction is governed by tRNA identity elements that are recognized by the aminoacylating enzymes (2). The universal distribution and conservation of tRNAs and aaRS imply that they preceded the origin of the three kingdoms of life, Bacteria, Archae, and Eucarya (3)(4)(5). Significantly, nucleotide determinants other than the anticodon triplets are important for aminoacylation efficiency and specificity (6,7). It is these nucleotides (making up an operational RNA code) that are now seen as important for maintaining a universal genetic code.

Background
Typically, aminoacylation occurs in two steps. First, the enzyme (E) condenses its cognate amino acid (AA) with ATP to form a tightly bound aminoacyl adenylate (AA-AMP) with the release of pyrophosphate (PP i ). The aminoacyl group is then transferred to the 3Ј-end of tRNA to give aminoacyl-tRNA (AA-tRNA) and release of AMP. In this way, a specific nucleotide triplet (anticodon) in the tRNA is physically connected (through the tRNA structure) with a particular amino acid.
Transfer RNAs are usually comprised of 76 nucleotides arranged into a cloverleaf structure with four major arms. The acceptor stem is a helix of 7 bp that ends on the 3Ј-side with the universal tetranucleotide NCCA 76 , with the amino acid attachment site at A76. The dihydrouridine-, TC-, and anticodon-stem-loop make up the other three arms (Fig. 1). The four arms are arranged in three dimensions into an L-shaped structure (8,9), where the acceptor and TC stems stack together to make up a 12-bp hairpin known as the minihelix (ending in the TC loop) (10). At right angles, the Dand anticodon stems fuse to give a 10-bp helix with the D-loop at one end and the anticodon loop at the other. Thus, the triplet of the code and its cognate amino acid are in distinct domains at opposite ends of the tRNA structure (Fig. 1).
The minihelix domain terminating in the CCA trinucleotide is found as a regulatory element for replication of specific RNA genomes (11,12). In the ribosome, the anticodon-containing and the minihelix domain bind to distinct rRNAs (13). This observation raises the possibility that the minihelix and anticodon-containing domains had separate origins. That an ancient minihelix duplicated and gave rise to the anticodon-containing domain and genetic code has also been proposed (14).
In bacteria, there typically is one aaRS for each amino acid. In eukaryotes, distinct nuclear encoded cytoplasmic and mitochondrial enzymes carry out aminoacylations in their respective cellular compartments. Broadly speaking, the enzymes are comprised of two major domains. The historical, most ancient domain contains the catalytic site with determinants for binding the minihelix portion of the tRNA. These catalytic domains are limited to two folds that define two families known as classes I and II (15)(16)(17)(18)(19). (With rare exceptions, each class contains enzymes specific for 10 different amino acids.) Most of the structural evolution that gave rise to the two classes of synthetases took place before the first split of the universal tree of life based on analyses of 16 S RNA sequences (4,5,20). The synthetases also have a second major domain that, in many instances, interacts with the anticodon. The idiosyncratic structures of these domains, even for enzymes within the same class, suggest that the second domain was added later in evolution.

The Minihelix and an Operational RNA Code
for Amino Acids An obvious way for an aaRS to relate a specific amino acid to a nucleotide triplet is through direct recognition of the tRNA anticodon. However, the anticodon is not used as the principal determinant for aminoacylation by alanyl-, seryl-, or leucyl-tRNA synthetases (1). For example, bacterial and eukaryote cytoplasmic alanyl-tRNA synthetase throughout evolution rely on a specific G3:U70 base pair in the acceptor stem to define the identity of tRNA Ala (21-25) (Fig. 2). No physical contact is made by the enzyme with the anticodon (26). As a consequence, a minihelix or even smaller helices (e.g. microhelices of 7 bp) that contain a G3:U70 base pair are robust substrates for aminoacylation by bacterial, yeast, and human enzymes (10,24,25). Variants of these substrates with natural and non-natural base analogs have been useful for evaluating energetic contributions of the G3:U70 base pair (6,(27)(28)(29).
These observations are mirrored by numerous examples of tRNA synthetases that charge microhelices based on the sequences of the acceptor stems of their cognate tRNAs (6, 7, 10, 30 -44). Thus, despite synthetase contacts with the anticodon (45), the acceptor stem often contains determinants sufficient for specific aminoacylations. The sequences/structures in RNA oligonucleotides that mimic the acceptor stem and confer specific aminoacylations constitute an operational RNA code for amino acids (46) (also referred to as the "second genetic code" (47,48)). These determinants typically are comprised of 1-3 bp and the N73 "discriminator" base. The operational RNA code may have predated the genetic code and according to some analyses was the progenitor of the genetic code (6, 48 -52).

Barriers to Cross-domain Aminoacylations and Their Manipulation
The tyrosine and glycine systems illustrate how the position of acceptor stem determinants for aminoacylation, but not the determinants themselves, have been conserved (1,2). For example, eubacterial TyrRS do not aminoacylate eukaryotic cytoplasmic tRNA Tyr (53). Conversely, eubacterial tRNA Tyr cannot be aminoacylated by eukaryotic TyrRS. This domain specificity correlates with the change of the conventional G1:C72 base pair found in most tRNAs to C1:G72 in eukaryotic and archaeal tRNA Tyr sequences. 2 The 1:72 base pair was demonstrated to be important for aminoacylation of microhelices or tRNAs based on sequences of tyrosine acceptors in Bacillus stearothermophilus (55), the eukaryote pathogen Pneumocystis carinii (56), the yeast Saccharomyces cerevisiae * This minireview will be reprinted in the 2001 Minireview Compendium, which will be available in December, 2001. This work was supported by Grant 23562 from the National Institutes of Health and by a fellowship from the National Foundation for Cancer Research.
Remarkably, these aminoacylation barriers could be overcome through the generation of chimeric enzymes that contained a 39amino acid fragment of the eukaryotic enzyme within the context of the eubacterial TyrRS (57). Conversely, incorporation of the bacterial peptide fragment into the body of the human enzyme enabled the latter to charge the bacterial substrate while losing its ability to charge the human RNA. These experiments illustrate that the position of a determinant important for aminoacylation was conserved and that coadaptations by the cognate synthetase maintain specific recognition. A similar principle presumably operates with glycyl-tRNA synthetases (Fig. 3) (58,59). Thus, acceptor-stem positions important for aminoacylation have been conserved across phyla, and variations at these positions can account for domain specificity of aminoacylation.

A Variation That Suggests Relative Timing of Appearance of Synthetases and tRNAs
Unlike most species, the archaebacterium Methanococcus jannaschii does not have a gene coding for a class II lysyl-tRNA synthetase (60). Instead, aminoacylation of tRNA Lys is catalyzed by a class I enzyme (61). Phylogenetic analysis of the novel class I LysRS showed that its origin cannot be explained by a recent gene transfer event (62). Analysis of sequences of tRNA Lys from all phylogenetic domains showed that tRNA Lys does not divide into two groups that follow the distribution of its two different aminoacylating enzymes (62). The coherence of the tRNA Lys sequences implies that the identity of this tRNA was established independently (and probably before) the establishment of the two forms of LysRS (62). This situation is unlike the case of glutaminyl-and asparaginyl-tRNA synthetases. These two aaRS appeared later in evolution as result of duplications of genes for glutamyl-and aspartyl-tRNA synthetases, respectively, that were laterally transferred across the phylogenetic tree (63,64).
Class I tRNA synthetases recognize the minor groove side of the acceptor stem whereas class II enzymes approach tRNA from the major groove side (18,(65)(66)(67). Many tRNA Lys contain an important (for aminoacylation) G2:C71 and can be aminoacylated by both class I and class II LysRS (68). This result implies that opposite sides (and distinct atoms) of the same base pair are recognized by the two types of enzymes. In contrast, tRNA Lys from the spirochete Borrelia burgdorferi (an organism that has a class I LysRS) contains a G:U base pair at position 2:71. This G:U base pair blocks aminoacylation of tRNA Lys by class II Escherichia coli LysRS (68). Thus, displacement of the class II LysRS by its class I counterpart in spirochetes was possibly because of a subtle variation in the operational RNA code for lysine that blocked interaction with class II LysRS (68, 69) (Fig. 4). We suggest that the evolution of preexisting identity elements in ancestral tRNAs may have been one of the main evolutionary pressures for selection of emerging forms of aminoacyl-tRNA synthetases.

An Example Where Identity Element Variations
Are Uncommon In addition to the strong conservation of G3:U70 to mark a tRNA for aminoacylation with alanine (Fig. 2), aspartyl-tRNA synthetase (AspRS) has a widely conserved system for recognition of tRNA Asp through interactions with the anticodon triplet and the G73 discriminator base (1,70). A phylogenetic tree derived with AspRS shows that its species distribution is highly coincident with the canonical tree of life (71). (PheRS, LeuRS, and GluRS are the only others to show the same coincidence (64).) Thus, although it might be expected that an enzyme that recognizes a set of universal identity elements would easily transfer between species, the population of genes coding for AspRS has not been subject to lateral gene transfers across different phyla.
We propose that the lack of documented examples of such transfers may be related to the ability of AspRS to recognize the related tRNA Asn in certain organisms. For example, in archaea, a canonical asparagine-tRNA synthetase is missing. The aminoacylation of tRNA Asn with asparagine is accomplished through an initial aspartylation of tRNA Asn catalyzed by AspRS. (This aspartylation is followed by a transamidation catalyzed by a separate enzyme (72).) Not surprisingly, archaeal tRNA Asn contains the important G73. To recognize tRNA Asn , archaeal AspRS has a modified recognition mechanism for the anticodon. In particular, it is insensitive to the base at position 36 (the only anticodon difference between tRNA Asn and tRNA Asp ) (73).
Bacterial and eukaryotic organisms do not require this mechanism of generating Asn-tRNA Asn , because they utilize a canonical AsnRS that probably arose as a duplication of an ancient AspRS. The canonical distribution of AspRS may occur in part because archaeal organisms require an enzyme of loose tRNA specificity. Thus, these organisms cannot utilize a bacterial or eukaryotic AspRS that would not aminoacylate tRNA Asn .
Thermus thermophilus, an eubacterium that contains a normal AsnRS and an archeal AspRS, represents the only example (so far) of lateral transfer of an archaeal AspRS into another kingdom. This situation could represent an isolated adaptation in Thermus against asparagine starvation (71). In such circumstances, Thermus would shift from the usual aminoacylation of tRNA Asn by AsnRS to a more complex pathway of charging tRNA Asn with aspartate using the archaeal type AspRS and later transforming the tRNA Asp into tRNA Asn via a transamidase reaction (71). Indeed, the transamidase is found in T. thermophilus (74).

Subtle Variations in Operational RNA Code May Be Essential to Maintain a Universal Genetic Code
At the time of emergence of the translational apparatus, the operational RNA code had the capacity to adapt to the problems of discrimination of increasingly large populations of RNA molecules. These populations extended beyond just tRNAs to cellular RNAs such as mRNAs that could potentially cross-bind and thereby inhibit a tRNA synthetase. Identity elements in tRNA acceptor stems mutated in different taxonomic groups, preventing cross-species aminoacylations of many different tRNAs. These aminoacylation barriers blocked genetic exchanges involving genes for tRNAs and their synthetases and were probably important to avoid disruption of the genetic code. In particular, if two distinct synthetases for the same amino acid are present (either via gene duplication or because of lateral gene transfer), then mutations can accumulate in one of them while the other is held fixed. It is understood that gross mischarging would be rapidly eliminated, but more subtle interactions with suppressor tRNAs or tRNAs containing infrequent codons could alter the amino acid or tRNA specificity and gradually introduce changes in the codon-amino acid relationships (75).
Disruption of the universal genetic code has happened in rare instances, but its overall conservation points to the existence of strong selection pressures against its variation. In this context, the operational RNA code for aminoacylation of tRNA molecules would act as a strong deterrent against contamination of the code through nonspecific charging. This requirement for tight tRNA recognition might explain why most tRNA synthetases in bacteria and archaea are encoded by single copy genes. With single copy genes, the opportunity for contamination of the genetic code is greatly restricted. The situation in eukaryotes is essentially the same. A separate, distinct gene is designated for a synthetase in each cell compartment, the cytoplasm and mitochondria.

Operational RNA Code in Relation to Emergence of Eukaryotic Cell
The combination of archaebacterial and eubacterial species gave rise to the eukaryotic cell and generated organelles like plastids and mitochondria (76). The physiologic fusion of these species had to include integration of their systems for aminoacylation. The evolutionary solution to the duplicated synthetase-tRNA systems could have been determined by the interplay between mutations in the synthetases (and acceptor stem elements) that were being merged (54,64). Eventually, two compartments, mitochondria and cytoplasm, emerged that utilized the same genetic code.
For example, two genes for the same enzyme activity (cytoplasmic and mitochondrial) are encoded by the genomes of contemporary eukaryotes. The possibility for two versions of the same enzyme ending up in the same cellular compartment is consequently significant. Were a misplaced mitochondrial enzyme to recognize the same acceptor stem elements as its cytoplasmic counterpart, the presence of two synthetases in the cytoplasm (targeted to the same acceptor stem) gives opportunity to invade and alter the genetic code (see above). By having distinct recognition elements for the mitochondrial form, the likelihood of such an invasion is greatly diminished. This consideration may account in part for why acceptor stem elements for animal mitochondria show more differences from their cytoplasmic counterparts than is typically seen for the same elements compared across taxonomic domains. Thus, although throughout evolution the G3:U70 base pair marks a tRNA for aminoacylation with alanine (Fig. 2), G3:U70 is often not found in animal mitochondrial tRNA Ala . (However, G3:U70 is commonly found in mitochondria of other eukaryotes. 2 ) Analysis of identity elements for other animal mitochondrial tRNA sequences revealed that tRNA Ala is not an isolated example.
The striking variations in animal mitochondrial tRNA identity elements may also be related in part to a decreased level of genome complexity. In general, the set of tRNA genes in animal mitochondrial genomes is largely reduced (E. coli contains 40 tRNA genes, and the human mitochondria contains 22). This reduction means that aaRS tRNA synthetases now have to discriminate among a smaller population of tRNA molecules. Perhaps, under these circumstances, the evolutionary pressure to maintain a given set of identity elements is reduced, because certain discrimination problems no longer exist.
In summary, the genetic code is seen as preserved throughout evolution (across all taxonomic domains and in higher eukaryotes with their separate cell compartments) as a consequence of adaptations in the identity elements in tRNA acceptor stems that constitute an operational RNA code. These adaptations are a necessary consequence of the need to keep anticodon sequences fixed to have a universal code, on the one hand, and on the other, the need to facilitate the expansion and diversification of living organisms. This ancient RNA code, which may have started with the sequencespecific aminoacylation of minihelix-like precursors of tRNAs by ribozymes, has endured long after the genetic code was established because it offered a defense against invasions of the code arising from tRNA synthetases with amino acid-anticodon assignments that differed from those of the universal code. It also endured because of its capacity to respond to increasingly large and diverse populations of RNAs and the problems of discrimination that they presented.