Urzymology: Experimental Access to a Key Transition in the Appearance of Enzymes*

Urzymes are catalysts derived from invariant cores of protein superfamilies. Urzymes from both aminoacyl-tRNA synthetase classes possess sophisticated catalytic mechanisms: pre-steady state bursts, significant transition-state stabilization of both amino acid activation, and tRNA acylation. However, they have insufficient specificity to ensure a fully developed genetic code, suggesting that they participated in synthesizing statistical proteins. They represent a robust experimental platform from which to articulate and test hypotheses both about their own ancestors and about how they, in turn, evolved into modern enzymes. They help reshape numerous paradigms from the RNA World hypothesis to protein structure databases and allostery.

The length of this minimal GlnRS (260 residues) is still roughly twice the size of the putative GlnRS Urzyme, which is shown as a ribbon within the context of the full-length GlnRS.
Similar work established the functionality of tRNA acceptor stems (21,22) and microhelices (23), which could be acylated by intact cognate aaRS. Thus, contemporary aaRS and tRNAs both contain functional subsets that are 50 -85% smaller than their full-length relatives. The distinction between these forms and their full-length relatives is both quantitative and qualitative. They have fewer domains and exhibit significant reductions in catalyzed rates.
The radical, directed protein surgery necessary to create Urzymes was motivated by the Rodin-Ohno (RO) hypothesis (24) (supplemental Figs. S2 and S3) that ancestral Class I and Class II aaRS coding sequences were originally complementary strands of the same gene. The only segments of the superfamilies that could be aligned antiparallel as suggested by Rodin and Ohno also turned out to be the invariant cores that position active site residues. The RO hypothesis thus predicted that these cores were intermediates in aaRS evolution, and hence should be catalytically active (18).
Excising Urzymes from full-length enzymes exposes many hydrophobic side chains. These residues must be identified and mutated to restore solubility. Computational methods identified side chains with the greatest newly generated solvent-accessible surface area. Suitable mutations were suggested by the Rosetta protein design program (25). Native TrpRS Urzyme sequences at interfaces with deleted sequences are active when the wild type Urzyme is fused to the anticodon-binding domain (15).
Physical principles motivating the choices that Rosetta makes for these extinct sequences may overlap those induced by selective pressures for stability (26), evoking surrogates for sequence information missing in multiple sequence alignments from living organisms. Thus, protein design extends the study of protein evolution substantially closer to the origin of life.
My colleagues and I are fortunate to have begun by investigating aaRS Urzymes. Urzymes derived from the invariant structural cores of Class I TrpRS (17), LeuRS, 3 and Class II HisRS (16) all have only ϳ15-25% of the total contemporary mass, yet they accelerate both amino acid activation and tRNA aminoacylation proportionately by ϳ10 8 -fold over the uncatalyzed rates (27).
Their strong phylogenetic support (structural invariance) and high catalytic proficiencies afford a robust platform. They point both backward in time to yet more ancient antecedents and forward to the fully developed genetic code (Fig. 1). Thus, their catalytic activities establish a base camp for articulating and testing new and previously inaccessible experimental studies to identify and test intermediates in molecular evolution and allostery (13,27).

Urzyme Catalytic Activities Satisfy Complementary Tests of Authenticity
Three lines of evidence, pre-steady state burst size, sensitivity to mutation, and substrate binding affinity, reinforce the conclusion that the Urzymes themselves are the authentic sources of the observed catalytic activities. Most unexpected was that the TrpRS, LeuRS, and HisRS Urzymes all exhibit pre-steady  Many modern enzymes likely preceded the last universal common ancestor and first "organism" (LUCA). B, methods appropriate for studying objects and processes along the timeline in A. Because urzymology connects the earliest genetic coding to the emergence of modern enzymes, it affords a powerful enabling technology for studying key transitions in the evolution of the genetic code. state bursts comparable in magnitude with the catalyst concentrations, ruling out contamination by tiny amounts of fulllength aaRS (16 -18). Bursts established that rate-limiting product release (28) was a third fundamental link to contemporary enzymes, in addition to accelerating amino acid activation and tRNA aminoacylation (see supplemental Fig. S1).
Mutational and protein engineering experiments also support the authenticity of aaRS Urzyme catalytic activities. TrpRS and HisRS Urzymes, expressed as maltose-binding protein (MBP) fusions, are activated ϳ50-fold by tobacco etch virus protease cleavage. Further, four different HisRS Urzymes differing in the presence or absence of class-defining signature Motif 3 and a 6-residue N-terminal extension exhibit catalytic differences consistent with catalytic contributions of each module plus a significant (i.e. Ϫ1.6 kcal/mole) synergistic interaction between them (16). Finally, point mutation of active-site residues alter catalytic activity by an order of magnitude or more (16,17). None of these effects are consistent with activity from a contaminating catalyst.
Steady-state kinetic parameters afford a third line of evidence for authenticity. Both TrpRS and HisRS Urzymes bind ATP tightly, but amino acid affinities are 10 -100-fold lower than those of the full-length enzymes. The TrpRS Urzyme tryp-tophan K m is 1-2 mM, 500 times that of intact TrpRS (17). Weak cognate amino acid binding suggests that discrimination against similar, non-cognate amino acids is also weakened, as observed (13,15,17).

aaRS Urzymes: Low Specificity, High Proficiency Catalysts
aaRS Urzyme catalytic activities are dramatically higher than estimates for the uncatalyzed rates (Fig. 3A). Notably, Urzymes from both classes accelerate both activation and acylation 10 5 -10 6 -fold more than necessary to launch ribosome-independent protein synthesis (Fig. 3B). Further, the two classes achieve similar activities with similar masses, consistent with their joint requirement for protein synthesis. Thus, both classes appear to have achieved comparable proficiency increments as their entirely different domain architectures grew comparably in size.
Urzyme specificities are low. Spectra for Class I LeuRS and Class II HisRS Urzyme specificities (Fig. 3C) are similar and complementary. Both Urzymes activate a range of non-cognate amino acids. Nonetheless, each Urzyme exhibits an ϳ5-fold preference for amino acids from its own class. Unknown differ- ences between aaRS Urzymes and the true ancestral forms may account for some of their promiscuity.
The TrpRS Urzyme provided a unique "molecular knockout" lacking the entire CP1 and ABD, affording a baseline against which to determine contributions of the two deleted modules. Neither CP1 nor the ABD restored any specificity (15), which results entirely from their energetic coupling.
Using allosteric interactions between genetic modules entirely absent from the Urzymes to enhance specificity resolves challenges (29 -33) associated with failure of rational amino acidbinding pocket point mutants to accomplish anything but reducing catalytic activity. Moreover, generating orthogonal synthetase-tRNA pairs appears to require pruning all amino acid-binding residues to alanine and using random mutation to select rebuilt pockets to match the altered substrates consistent with induced-fit mechanisms (34).
These partial results suggest that aaRS Urzymes could not support a canonical 20-amino acid alphabet. Low Urzyme specificity and the fact that Urzymes cannot utilize the tRNA anticodon for recognition form the first experimental basis for the conjecture of Woese (1,35) that the first coded proteins were statistical ensembles, without unique sequences.

Reduction and Recapitulation
"You have to deeply understand the essence of a product to be able to get rid of the parts that are not essential." (Jony Ive, quoted in Ref. 36) Urzymes afford a robust platform for reductionist experiments aimed at characterizing even simpler ancestral protein catalysts that look backward in time (37,39) 4 and for recapitulating plausible intermediates to test possible evolutionary paths that look forward in time (15) (Fig. 1).
The ability to measure the subtle effects of modules as small as 6 -20 amino acids (16) greatly enhances the resolution of modular deconstruction as a tool in protein science. Radical surgery of Class II aaRS afforded evidence that Motif 3, considered essential to catalytic activity because of its interactions with ATP, is dispensable and synergistic even with modular additions elsewhere, including a 6-amino acid N-terminal extension to Motif 1 (15). The catalytic role of Motif 3 may be realized fully only when the Class II insertion domain is present between the Urzyme and Motif 3 (40), or in full-length HisRS. Further experiments are necessary to map the intermodular synergy.
The unprecedented radical protein surgery that gave rise to the Class I TrpRS and LeuRS Urzymes entailed removing one or more long internal peptides as well as conventional truncation of the C-terminal anticodon-binding domain. Deleting CP1, an internal subdomain, was non-trivial because the two remaining fragments had to be joined together without corrupting the active site. The LeuRS Urzyme 5 entailed removing CP2, in addition to CP1. Removing internal segments was facilitated by the fact that ␣ carbon atoms of their N-and C-terminal residues in full-length aaRS are separated by the length of a peptide bond (13).
Straightforward removal of CP1 from TrpRS and LeuRS and CP2 from LeuRS affords existence proofs that the reverse process, assimilation of a CP1 or CP2 ancestor, is a legitimate evolutionary operation accounting for these insertions into the Class I aaRS as they evolved. Further support for this conclusion arises because insertions into the Toprim domain (41), the closest contemporary homolog of Class I aaRS Urzymes, occur at the locations of Class I CP1 and CP2 insertions (Fig. 4) (42).
CP1 wraps nearly 360°around the TrpRS specificity ␣-helix, constraining it against the N-terminal crossover connection that binds ATP. Molecular dynamics simulations (17) suggested it might stabilize that helix, which reorients markedly in long trajectories in the presence of ATP, but without tryptophan.
Full TrpRS specificity and tRNA Trp aminoacylation activity (15) both require essentially complete interdomain synergy (also called epistasis (43)(44)(45)). The observed epistasis (15) implies that both modules interact in the decision to activate the amino acid present in the active site. Combinatorial mutagenesis (13) implicated the D1 Switch, a dynamic (46) packing motif (47), in mediating allosteric communication between domain movement and catalysis. The four-way energetic coupling between mutated D1 residues (13) makes a quantitatively equivalent contribution to both catalysis and specificity as is observed for the intermodular CP1XABD energetic coupling (15).
Unexpectedly, the TrpRS Urzyme appears to have higher fitness at the two tasks, amino acid recognition and tRNA aminoacylation, required of aaRS than either of the two larger intermediate, potentially more advanced constructs. Urzymes thus appear to lie closer to the actual path of aaRS evolution. The negative epistasis suggests that evolutionary growth of contemporary aaRS must be subtler than simply accumulating either CP1 or ABD domains.

RO Hypothesis Redux
The RO hypothesis may have been ignored because it was not obvious how to test it. The relevant objects, ancestral Class I and II aaRS, are so remote that it was hardly evident that the hypothesis could be falsified (48). My colleagues and I articulated and verified bioinformatic (14) and biochemical (16 -18, 27) predictions. The balanced, proportionate rate accelerations of both amino acid activation and tRNA aminoacylation by TrpRS and HisRS Urzymes confirmed the prediction that Class I and II segments consistent with antiparallel alignment should be active (27).
Bioinformatic predictions of sense/antisense coding ancestry were tested by excerpting a 94-residue Urgene from ϳ200 contemporary coding sequences of Class I TrpRS and Class II HisRS. Tyrosyl-tRNA synthetase (TyrRS) and prolyl-tRNA synthetase (ProRS) served as outgroups in rooting the respective trees. Codon middle bases formed base pairs in ϳ0.34 of all-by-all antiparallel alignments in all four cases, with a standard error of Ͻ0.0003, as compared with a well established value of 0.25 for the null hypothesis (supplemental Fig. S4). Middlebase pairing increased in independently reconstructed ancestral sequences for the two trees (supplemental Fig. S5 (14). Middle-base pairing of sense/antisense-related sequences thus appears to be a phylogenetic metric that persists far deeper into the past than do metrics for multiple sequence alignments for a single phylogenetic tree.
Their sophistication and tuned catalytic activities argue that aaRS Urzymes had far simpler ancestors. The sense/antisense ancestry of the Urzymes suggests in turn that these simpler ancestors also were encoded on opposite strands of the same gene(s). Phylogenetic evidence and the similar comparisons between two Class IC and two Class IIA aaRS sequence alignments (14) and analysis of amino acid activation by 46-residue ATP-binding sites coded by a designed sense/antisense gene (37) suggest, in turn, that certain properties of these ancestors may be accessible via methods analogous to those of ancestral gene reconstruction (8,12). See Ref. 39 for additional details.

Urzymology-driven Paradigm Shifts
My colleagues and I expected that experimental studies of ancestral aaRS would identify novel perspectives on the origins of translation itself and hence in contemporary molecular biology and biochemistry. However, unexpected new perspectives on phylogenetics/genomics and the origins of protein folding, catalytic activity, specificity, and allostery will also likely shift and enhance comprehensive paradigms in genetics and biophysics.

Codon-dependent Translation
The evolutionary history of the universal genetic code is, in a real sense, that of the two synthetase superfamilies and their cognate tRNAs. Support for the RO hypothesis argues against paradigms holding that the aaRS classes appeared independently, one after the other (49,50). Urzyme tRNA acylation activity opens to more detailed testing the proposal that early translation used an "operational RNA code" (51) vested only in the tRNA acceptor stem bases, the only parts of tRNA that can be recognized by aaRS Urzymes.
Sense/antisense coding projects further into the past than other metrics (14), implying that catalytic peptides responsible for activating amino acids co-evolved with tRNA from a very primitive state. Thus, contrary to the prevailing RNA World hypothesis, my colleagues and I restated (27) the proposal (52, 53) that genetic coding emerged from mutually catalytic RNA and peptides, using rudimentary stereochemical coding between the two biopolymers.

Phylogenetics/Genomics
Systematic protein structure classifications, SCOP (54) and CATH (55), fail to identify Urzymes of either aaRS class as ancestral forms (10). However, aaRS Urzymes represent plausible ancestors for a wide spectrum of contemporary proteins. The Rossmannoid superfamily (56), the biggest in the proteome (57), includes consensus homologs of Class I aaRS. My colleagues and I argued (58) that the 26 families in the HSP70/actin ATPase superfamily are ancient paralogs of Class II aaRS (pfam: CL0108).
Class II aaRS Urzymes illustrate a distinct, but related problem. One might expect their descendants to include a propor- tion of the contemporary proteome comparable with that (0.25-0.3) occupied by the Rossmannoid protein superfamily that includes Class I aaRS. The only proposed relative is the dual function biotin synthase/repressor BirA (59). Although evidence for that homology is strong, it relies heavily on the C-terminal halves of the molecules that follow Motif 2. This essentially modern homology may also be misleading because the smallest Class II Urzymes lack both Motif 3 and C-terminal insertion domains, which may obscure more distant homologies based on the Class II Urzyme, for example, the HSP70/ actin superfamily (58).
Validating the RO sense/antisense coding hypothesis (24) thus introduces new, potentially deep relationships between protein superfamilies. This conundrum highlights possible limitations in current structural database annotations, and accommodating Urzymes into evolutionary frameworks should therefore enhance inferences drawn from structural databases.

Protein Folding
Information carried by a gene is generally perceived to be unambiguous. However, for sense/antisense genes, this unique information has two valid but quite different interpretations resulting in two different folds, depending on which strand is read. The inverse duality in the genetic code (60) ensures that secondary structures coded by opposite strands of the same gene can both be amphipathic, consistent with formatting globular ensembles that are, in a sense, "inside out" (Fig. 6 of Ref. 14).

Catalysis
A widespread consensus (61)(62)(63) holds that enzymic rate enhancements require strong bonds to the transition state and hence favorable enthalpies of activation. To minimize unfavorable entropy changes in forming such strong transition state interactions, one naively expects aaRS Urzymes to be properly folded proteins. We do not yet know whether Urzymes are fully folded. Recently, however, Hu (64) confirmed that alternative properly folded and molten globular forms of the same enzyme (65) achieve the same rate enhancements by different enthalpy/ entropy compensations. Molten globular proteins can actually achieve higher transition state complementarity than more rigid, folded proteins. Dissociating catalysis from the requirement for folding suggests that a much broader range of early translation products may have been catalytically active, accelerating early stages of natural selection.

aaRS Modularity
The modularity of contemporary protein structures is a perplexing problem that intensifies challenges posed by protein evolution. aaRS enzymes exhibit unusually high percentages (5-fold more than other families (66)) of accreted sequences with new functionality, i.e. physiocrines, along the organismal phylogenetic trees. Physiocrines appear in aaRS from both classes. Their functions are unrelated to translation, ranging from endocrine regulation of cardiovascular development to immune system activities, as well as regulation of mammalian target of rapamycin (mTOR), IFN-␥, and p53 signaling (67). Urzymology introduces techniques needed to recapitulate the gain of such functions.

Specificity and Allostery
The most significant puzzle arising thus far from studies of urzymology is that enhanced specificity of contemporary TrpRS, relative to its Urzyme, involves negative epistasis, requiring allosteric energetic coupling between the ABD and the CP1 insertion via a switching element intrinsic to the Urzyme (13,15). Intramolecular epistasis may have developed while different modules present in contemporary aaRS functioned in trans. Some authors suggest that off-loading functions such as specificity to allosteric effects may be beneficial by making specificity a more robust property (68). The dynamic switching element responsible for this coupling is a widely conserved packing motif (47). How this "protoallosteric" motif (69) functions without the ABD and CP1 modules and how coupling might have emerged before modules became covalently joined remain outstanding questions.
Like knock-out mice, Urzymes are extensive molecular knock-outs that provide unique experimental baselines for helping to answer such questions. Their measurable catalytic rates facilitate multidimensional thermodynamic analysis of modular epistasis in mechanistic enzymology (13,15).

Protein Design
Recovering Urzymes by deleting non-essential protein masses has adverse effects on stability and solubility. Mutations, identified by Rosetta (60,61), compensate for these effects. Interfaces between Urzymes and more recently acquired modules can, in principle, also be redesigned. As proficient, relatively nonspecific catalysts, Urzymes can likely be engineered to acylate tRNAs with non-canonical amino acids (34,70) for industrial purposes.

Urzymes Have Measurable Activities
Most enzyme-catalyzed rates are within the same order of magnitude, irrespective of the uncatalyzed reaction rates (63). Roughly comparable enzyme-catalyzed rates appear to be a requirement for biology. An important implication is that catalytic activities of different enzymes have always been subject to such a constraint. The 10 8 -fold rate accelerations observed for Class I and II aaRS Urzymes (3,6) suggests that other Urzymes are likely also to have measurable catalytic rates.

Conclusions
In launching urzymology, my colleagues and I took three steps for which there was little, if any, precedent: (i) validating the Rodin-Ohno hypothesis; (ii) testing the evolutionary implication that the most highly conserved portions of enzymes (i.e. their invariant cores) probably have significant catalytic activities; and (iii) using protein design (Rosetta (25)) to compensate for protein mass lost on creating Urzymes, facilitating radical protein surgery. Previously, there was no way to formulate or test hypotheses about how simple, extinct ancestors came to resemble contemporary proteins. Urzymology now offers coherent paradigms that open unprecedented experimental access to mechanisms of very early protein evolution, as well as to novel and effective studies of contemporary mechanistic enzymology.
The first examples of urzymology introduced the ability to use high resolution modular engineering pro-actively, to address previously inaccessible questions. My colleagues and I developed Class I and II aaRS Urzymes to test the Rodin-Ohno hypothesis that ancestral forms of each class descended from opposite strands of the same gene. Unexpected results from that effort will likely change prevailing paradigms in several areas. In validating the hypothesis, my colleagues and I discovered that urzymology, because it connects the earliest genetic coding to the emergence of modern enzymes, affords a powerful enabling technology for studying a key transitional period in the evolution of the genetic code. aaRS Urzymes afford a base camp for probing even more primitive peptide catalysts and for recapitulating subsequent evolutionary steps leading to the emergence of full-length aaRS.