Metabolic origin of the fused aminoacyl-tRNA synthetase, glutamyl-prolyl-tRNA synthetase

About 1 billion years ago, in a single-celled holozoan ancestor of all animals, a gene fusion of two tRNA synthetases formed the bifunctional enzyme, glutamyl-prolyl-tRNA synthetase (EPRS). We propose here that a confluence of metabolic, biochemical, and environmental factors contributed to the specific fusion of glutamyl- (ERS) and prolyl- (PRS) tRNA synthetases. To test this idea, we developed a mathematical model that centers on the precursor–product relationship of glutamic acid and proline, as well as metabolic constraints on free glutamic acid availability near the time of the fusion event. Our findings indicate that proline content increased in the proteome during the emergence of animals, thereby increasing demand for free proline. Together, these constraints contributed to a marked cellular depletion of glutamic acid and its products, with potentially catastrophic consequences. In response, an ancient organism invented an elegant solution in which genes encoding ERS and PRS fused to form EPRS, forcing coexpression of the two enzymes and preventing lethal dysregulation. The substantial evolutionary advantage of this coregulatory mechanism is evidenced by the persistence of EPRS in nearly all extant animals.

About 1 billion years ago, in a single-celled holozoan ancestor of all animals, a gene fusion of two tRNA synthetases formed the bifunctional enzyme, glutamyl-prolyl-tRNA synthetase (EPRS). We propose here that a confluence of metabolic, biochemical, and environmental factors contributed to the specific fusion of glutamyl-(ERS) and prolyl-(PRS) tRNA synthetases. To test this idea, we developed a mathematical model that centers on the precursor-product relationship of glutamic acid and proline, as well as metabolic constraints on free glutamic acid availability near the time of the fusion event. Our findings indicate that proline content increased in the proteome during the emergence of animals, thereby increasing demand for free proline. Together, these constraints contributed to a marked cellular depletion of glutamic acid and its products, with potentially catastrophic consequences. In response, an ancient organism invented an elegant solution in which genes encoding ERS and PRS fused to form EPRS, forcing coexpression of the two enzymes and preventing lethal dysregulation. The substantial evolutionary advantage of this coregulatory mechanism is evidenced by the persistence of EPRS in nearly all extant animals.
Aminoacyl-tRNA synthetases (ARSs) 3 are ancient, evolutionarily conserved enzymes that ligate specific amino acids to their cognate tRNAs for accurate decoding of genetic information during protein synthesis. Twenty ARSs, corresponding to the 20 amino acids, are present in organisms from bacteria to mammals. In most metazoans, nine ARSs and three nonsynthetase auxiliary proteins form the cytoplasmic multi-aminoacyl-tRNA synthetase complex (MSC) (1)(2)(3)(4)(5)(6). In addition to their essential enzymatic activities, many ARSs exhibit noncanonical (or moonlighting) functions, distinct from their role in translation and contributing to pathophysiological processes, including development, angiogenesis, tumorigenesis, obesity, and inflammation (7,8).
Among the ARSs, glutamyl-tRNA synthetase (ERS; encoded by EARS gene) and prolyl-tRNA synthetase (PRS; encoded by PARS gene) are uniquely present as a single fused bifunctional protein, glutamyl-prolyl-tRNA synthetase (EPRS or GluProRS). EPRS is encoded by a fused gene in all known metazoans, with the unique exception of Caenorhabditis elegans (9,10). Mammalian EPRS resides in the MSC and catalyzes the ligation of both glutamic acid and proline to their respective tRNAs. The two synthetase domains are joined by a flexible linker generally containing one or more helix-turn-helix WHEP domains. The catalytic domains of EPRS are highly conserved; however, the WHEP domains show sequence divergence and duplications (and occasional losses) and are present in a variable number in animals (11). The linker is responsible for two of the noncanonical functions of EPRS. (i) It binds three other proteins (glyceraldehyde-3-phosphate dehydrogenase, ribosomal protein L13a, and NSAP1) in interferon-␥-activated myeloid cells to form the interferon-␥-activated inhibitor of translation (GAIT) complex that binds and silences the translation of multiple mRNAs encoding inflammation-related proteins (12)(13)(14), and (ii) it binds fatty acid transport protein 1 (FATP1) to facilitate long-chain fatty acid uptake for synthesis and storage of triglycerides in insulin-stimulated adipocytes (15).
ERS and PRS are products of two distinct genes in all bacteria, archaea, fungi, and plants, and thus EPRS likely results from an early gene fusion event, i.e. the merger of two previously distinct genes into a single transcription unit (16). The fusion of ERS-and PRS-encoding genes can be traced back to near the origin of animals. Recent sequencing of organisms near the base of metazoans revealed that ERS and PRS are unlinked in the unicellular icthyosporean Sphaeroforma arctica but are linked to form EPRS containing two WHEP domains in the unicellular filasterean Capsaspora owczarzaki. Thus, the fusion event occurred in a unicellular This work was supported by National Institutes of Health Grants P01HL029582, P01HL076491, and R01GM115476 (to P. L. F.). The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This article was selected as one of our Editors' Picks. This article contains supporting details of mathematical model development and Tables S1-S6. 1  cro EDITORS' PICK opisthokont after the divergence from fungi about 1 billion years ago (11) (see Fig. 1). The advantage underlying the specific selection of the synthetase genes encoding ERS and PRS for fusion, and its maintenance in nearly all animals, is not known. We considered the possibility that a fused EPRS was required for structural integrity of the MSC. Small complexes of two to four ARSs are present in some archaea, trypanosomes, and fungi, all organisms lacking fused EPRS (17). Through a gradual process of accretion and occasional deletion of specific ARSs, a "mega" MSC consisting of eight or more ARSs (plus noncatalytic auxiliary proteins) appeared at about the same time as the linkage of ERS and PRS in metazoan animals or their unicellular animal-like ancestors ( Fig. 1). However, the large MSC in C. elegans containing eight ARSs, including ERS but not PRS, suggests that a fused EPRS is not essential for MSC formation or integrity (18). Furthermore, a putative requirement for EPRS in structural integrity of the MSC does not clarify the specific selection of this ARS pair for the fusion. Here, we address an alternate hypothesis and provide support with a mathematical model that a confluence of metabolic, biochemical, and environmental factors contributed to the specific fusion of EARS and PARS genes to form the bifunctional gene, EPRS.

Metabolic relationship between glutamic acid and proline
We considered the metabolic relationship between glutamic acid and proline, the former is the precursor of the latter, as a potential basis for the fusion event. We have previously reported the deep relationship between glucose catabolism and ARSs and their amino acid substrates (1). Nineteen of the 20 amino acids are derived from intermediates of either glycolysis or the citric acid cycle (histidine is the sole exception). Moreover, eight of the nine amino acid substrates of the ARS constituents of the MSC are derived from two intermediates of the citric acid cycle, ␣-ketoglutarate (␣-KG) and oxaloacetate.
Importantly, glutamic acid is derived from ␣-KG and is the precursor of proline (along with glutamine and arginine) (Fig.  2). In certain organisms, proline can be generated from ornithine; however, ornithine is derived from arginine, which in turn is derived from glutamic acid (19,20). The synthesis of proline from glutamic acid is inhibited by proline-mediated negative feedback of pyroline-5-carboxylate reductase (21,22). An important implication of the dependence of proline on glutamic acid is that an increased demand for proline will not only deplete cellular free proline, it can also drain glutamic acid from the cellular pool, which will in turn affect cellular levels of glutamine and arginine. These observations led us to consider the unique metabolic relationship between glutamic acid and proline as the underlying force driving the fusion of their cognate ARSs.

Relationship between amino acids and their cognate ARSs
In organisms from bacteria to animals, there are multiple examples of depletion of an amino acid inducing up-regulation of the corresponding ARS. The elevated ARS level presumably enables more efficient utilization of the limiting amino acid, thereby providing a survival advantage to the cell when a particular amino acid is in short supply. For example, valine limitation induces the expression of ValRS (23). In prokaryotes, the induction is driven by a tRNA-mediated antitermination mechanism during ARS gene transcription (24 -27). In Saccharomyces cerevisiae, when lysine is deficient, uncharged tRNA Lys binds GCN2, thereby increasing the translation of GCN4, a transcriptional activator of LysRS (28). Similar regulation has been reported in higher eukaryotes, but molecular mechanisms remain unclear. For example, expression of PheRS and MetRS is elevated in cultured animal cells grown in media deficient in phenylalanine and methionine, respectively (29, 30). In a related example,

Metabolic origin of EPRS
expression of EPRS increases in rat salivary glands when the expression of proline-and glutamic acid-rich proteins are induced by isoproterenol, presumably depleting the free amino acid pools (31).
In these examples, the response is specific, and the ARS gene is induced only by limitation of the cognate amino acid, not by general amino acid starvation. Thus, amino acids are primary regulators of their cognate ARSs in both prokaryotes and eukaryotes. We propose that the increase in demand for proline, and the consequent reduction in the cellular pools of proline and glutamic acid, would have markedly induced expression of ProRS and GluRS in those early organisms, further reducing the cellular pools of these amino acids. This vicious cycle would cause extremely harmful consequences to the cell and organism, and its avoidance conceivably was the driving force behind the fusion event.

Mathematical model of metabolism of glutamic acid and proline, their cognate ARSs, and EPRS
To test this hypothesis, we developed a mathematical model that captures the relationship between the cellular concentrations of free proline and glutamic acid and the protein levels of their corresponding ARSs in two systems, i.e. organisms with distinct EARS and PARS genes and organisms with a fused EPRS gene. Our dynamic model consists of a system of ordinary differential equations as shown below (for details, see the supporting information).
The ERS-PRS system is represented by Equations 1-4.
The EPRS system is represented by Equations 5-7.

Effect of proline demand on steady-state levels of glutamic acid and proline and their ARSs
Intriguingly, proteome proline content, and presumably total proline incorporated into protein, undergoes an upward shift at the transition from the icthyosporean S. arctica, which contains nonfused ERS and PRS, to the filasterean C. owczarzaki, the earliest known organism with fused EPRS (Table 1)

Metabolic origin of EPRS
with a previous report (32). EARS and PARS genes are unfused in C. elegans (9), and a similar split was found in seven other nematodes (Table S6). Interestingly, the proteome proline content in nematodes (mean percentage, 4.83 Ϯ 0.20) is much less than in other animals (p Ͻ 0.005, Mann-Whitney test). These observations support the link between proline demand and the appearance of fused EPRS. The increase in proline content in animal proteomes might be a result of the advent of prolinerich proteins, such as collagen, which have played a vital role in the emergence of colonies and multicellular animals by serving as a support for cell-cell interactions (33). Actual proline utilization in metazoans is likely to be even higher than that reflected by proteome proline content because proline-rich collagen is the most abundant protein in animals. Importantly, collagen first appeared at about the time of the emergence of animals and the appearance of EPRS (Fig. 1) (34). We determined the influence of heightened proline demand by simulations in which k 5 , the rate of incorporation of proline during protein synthesis, was increased. A 10% increase in k 5 (from 0.10 to 0.11) caused only a small perturbation in steadystate levels of amino acids and ARSs in both systems (Figs. 3 and  4, A and B). In contrast, a 30% increase in k 5 (to 0.13) markedly decreased glutamic acid and proline and increased [ERS] and [PRS], but only in the ERS-PRS system (Fig. 4, C and D). A 50% increase in k 5 (to 0.15) in the ERS-PRS system caused a catastrophic effect in which glutamic acid and proline concentrations declined rapidly to near-zero, with a concomitant dramatic induction of ERS and PRS (Fig. 4E). Importantly, the EPRS system was only modestly perturbed, maintaining glutamic acid and proline at only slightly diminished levels (Fig.  4F). These results suggest that organisms with fused EPRS gene have a clear survival advantage in conditions of increased proline demand.

Influence of reduced citric acid cycle flux on glutamic acid and proline
At the time of the origin of animals, the atmospheric O 2 level was less than 1% of the current amount (35,36). Hypoxia inhibits the citric acid cycle, and thus organisms at the root of metazoan evolution would have reduced citric acid flux and consequent diminished levels of ␣-KG (37,38). In addition, hypoxia reduces ␣-KG levels by increasing its carboxylation to citrate by isocitrate dehydrogenase (39). Moreover, the glyoxylate shunt that bypasses ␣-KG was likely prevalent in organisms preceding the gene fusion event, further decreasing ␣-KG generation and the steady-state level of derived amino acids (Fig. 2) (1, 40). The effect of diminished ␣-KG was simulated as above. A 10% decrease (from 5.0 to 4.5) caused a major perturbation in the key synthetases and substrates in the ERS-PRS system only (Fig.  5, A-D). A 20% reduction in the ␣-KG level led to catastrophic depletion of glutamic acid and proline in the ERS-PRS system (and concomitant increases in ERS and PRS), but both were maintained at nearly half of the original level in the EPRS system (Fig. 5, E and F). We then simulated the effect of concurrent, modest increases in proline demand and reduced ␣-KG, conditions possibly more representative of the epoch of the gene fusion event. A 10% decrease in [␣-KG] combined with a 30% increase in the rate of proline incorporation into protein (k 5 ) was catastrophic for the ERS-PRS system, but only modest reductions in glutamic acid and proline were observed in the EPRS system (Fig. 6). These analyses are consistent with a marked resilience of the EPRS system to altered environmental and metabolic stresses that was possibly a major advantage driving the fusion of the ERS and PRS during early animal evolution.

Alternative solutions to metabolic stress conditions
We considered alternate systems that could also provide protection against the cellular stress conditions described in Fig.  6C. Increasing the rate of synthesis of glutamic acid from ␣-KG (Fig. 7A), reducing the rate of incorporation of glutamic acid into protein (Fig. 7B), and reducing the rate of synthesis of ERS (Fig. 7C) all enabled survival of the ERS-PRS system under high proline demand and low-level ␣-KG. These alternative systems are less efficient as the steady-state proline level is lower than that in the fusion system (Fig. 6D). Also, reducing the rate of incorporation of glutamic acid into protein (Fig. 7B) or the steady-state level of ERS by reducing its rate of synthesis (Fig.  7C) (or increasing its degradation rate; not shown) might be detrimental to cell viability as both would compromise protein synthesis. Because ␣-KG is required to maintain the optimal level of citric acid flux, enhanced synthesis of glutamic acid from already limited ␣-KG might not be a viable solution.
We recognized that the fusion event, by combining the two genes behind the promoter driving ERS, might prevent the regulation of EPRS transcription by proline. We considered the alternate possibility that the EARS and PARS genes remain unfused, but PRS regulation by [P] is lost. The absence of negative regulation of PRS by [P] (termed ERS-PRS⌬P system) can be represented by Equations 1-3 and 8.
Unexpectedly, this solution yielded steady-state levels of ERS and PRS and their cognate amino acids similar to that resulting from EPRS fusion, implicating negative regulation of PRS by [P] as a critically important factor (Fig. 7D). We analyzed the relative benefits of the EPRS and ERS-PRS⌬P systems, compared with the ERS-PRS system, with respect to the maintenance of cellular proline and glutamic acid under a range of ␣-KG concentrations. Compared with ERS-PRS, the fused EPRS system led to higher levels of both amino acids at all concentrations of ␣-KG examined (Fig. 8, A and B). The ERS-PRS⌬P system was more effective than ERS-PRS only at ␣-KG levels below 6. Moreover, at ␣-KG levels below 4.5, the ERS-PRS⌬P system

Metabolic origin of EPRS
was more effective than EPRS. Thus, the EPRS system might be more versatile than the others: it not only thrives in high proline demand and low ␣-KG conditions, but it can also take advantage of conditions of relatively high ␣-KG. Nevertheless, there might be other factors that drove adoption of fusion instead of the alternative solution involving loss of regulation by proline while maintaining distinct EARS and PARS genes.

Discussion
The environmental conditions on Earth a billion years ago were extremely dynamic and contributed to an equally dynamic evolution of life forms, including the critical switch in size and complexity during the transition from unicellular to multicellular eukaryotes. We previously reported that linked EPRS first appeared in an ancestor of the unicellular holozoan, C. owczarzaki (11). Interestingly, this early ancestor of animals exhibited a life cycle during which they formed clusters of cells. In fact, C. owczarzaki is the earliest known example of aggregative multicellularity in a unicellular relative of a multicellular metazoan (41). Cell clusters might deprive interior cell access to the environment, further depleting oxygen levels in an already oxygen-poor atmosphere. The transition to multicellularity necessitated the development of connective tissues as well as extracellular matrix (ECM) and ECM receptors. Consistent with its place in the transition to multicellularity, C. owczarzaki expresses genes encoding ECMlike domains and ECM receptors (42). Interestingly, the origin of collagens and collagen-like genes might have coincided with the sudden and dramatic increase in atmospheric oxygen that is required for conversion of proline to hydroxyproline, a major collagen constituent.
The environmental challenges present a billion years ago are no longer relevant, which begs the question why do extant metazoans maintain the fused EPRS gene? Although the current environment is far different from that present a billion

Metabolic origin of EPRS
years ago, tissue metabolic conditions might be determinative. For example, localized extremes in tissue hypoxia might provide an adaptive advantage to the fused protein. Alternatively, the RNA-and protein-binding functions of the WHEP domain-containing EPRS linker, required for the noncanonical activities of transcript-selective translation regulation and fatty acid uptake, contributed to its retention (13)(14)(15). The maintenance of fused EPRS in extant animals can also be explained by the constructive neutral evolution hypothesis. According to this theory, complexity in organisms emerges following accumulation of neutral mutations in groups of proteins (or protein subunits or domains) that do not change the functional property of the proteins, but the interdependence of the mutations prevents their deletion, and thus they are essentially unidirectional (43-45). The EPRS gene likely accrued neutral mutations during its billion-year evolution. Fixation of these mutations could have made ERS and PRS structures or func-tions of EPRS mutually dependent, thereby decreasing the fitness of animals following a fission event that disconnected the ERS and PRS genes and proteins. A potentially informative exception is the nematode C. elegans in which a fission event generated ERS protein with six C-terminal WHEP domains and PRS with a single N-terminal WHEP domain (46). The continued presence of the WHEP domains suggests that a noncanonical function was retained, possibly favoring the metabolic and functional hypotheses, rather than the constructive neutral evolution hypothesis, for retention of the linked EPRS in nearly all present-day animals.

Amino acid composition analysis
FASTA files of proteomes of multiple organisms were retrieved from the genome option of the National Center for  Figure 7. Alternate solutions to counter negative effects of increased proline utilization and reduced ␣-KG. A, k 1 , the rate of ␣-KG conversion to E, was increased from 0.10 to 0.12. B, k 3 , the rate of incorporation of E into protein, was decreased from 0.10 to 0.05. C, k Esyn , the rate of synthesis of ERS, was reduced from 0.1 to 0.05. D, removal of negative regulation of PRS by [P]. Unless stated otherwise, other system parameters and outputs are as in Fig. 3. Biotechnology Information data collection. Protein annotation lines that start with "Ͼ" were excluded because they do not contain amino acid sequence. Python code was written to calculate the percentage of each amino acid in the entire proteome. The ambiguously marked amino acids, if found in a protein sequence, were excluded from the calculations.