Toward a Systems Biology Perspective on Enzyme Evolution *

Large superfamilies of enzymes derived from a common progenitor have emerged by duplication and divergence of genes encoding metabolic enzymes. Division of the functions of early generalist enzymes enhanced catalytic power and control over metabolic fluxes. Later, novel enzymes evolved from inefficient secondary activities in specialized enzymes. Enzymes operate in the context of complex metabolic and regulatory networks. The potential for evolution of a new enzyme depends upon the collection of enzymes in a microbe, the topology of the metabolic network, the environmental conditions

The last universal common ancestor (LUCA) 2 had a few hundred genes encoding the machinery needed for replication, transcription, and translation; a few transporters and transcriptional regulators; an ATP synthase; and metabolic enzymes for synthesis of amino acids and nucleotides (1,2). In the ensuing 3.8 billion years, the protein repertoire of the LUCA expanded as new folds were discovered, domain fusions led to larger and more complex structures, and families and superfamilies emerged by duplication and divergence of ancestral genes (3). As this process occurred, the number of metabolic enzymes expanded, allowing microbes to take advantage of novel sources of nutrients and energy. Evolution of new enzymes continues in the present day as microbes adapt to the introduction of anthropogenic compounds such as antibiotics, pesticides, synthetic dyes, and explosives into the environment.
The evolution of novel enzymes plays out in the context of the metabolic and regulatory networks of the organism and the environment it inhabits. This minireview will take a systems biology perspective on the evolution of novel metabolic enzymes in microbes by duplication and divergence of an ancestral scaffold.

Enzyme Evolution Then and Now
The earliest enzymes were probably generalists, capable of catalyzing particular chemical transformations with broad substrate specificity. Duplication of genes encoding generalist enzymes followed by division of ancestral functions among the daughter genes would have generated enzymes with greater specificity (Fig. 1a). More specific enzymes typically have greater catalytic efficiency because specific substrate-binding sites can position the substrate in an optimal orientation relative to functional groups on the enzyme. Furthermore, evolution of specialist enzymes allows fluxes through metabolic pathways to be controlled more precisely.
At some point, possibly within one billion years after the origin of life, microbes had evolved a few hundred specialized enzymes that enabled efficient use of resources in their environmental niches. Since that time, the starting points for evolution of new enzymes have not been generalist enzymes, but promiscuous activities in specialized enzymes. Active sites of even very specific enzymes often bind molecules that bear some resemblance to their natural substrates, although usually with much lower affinity. Such active site intruders sometimes bind in an orientation that allows reactive functional groups on the enzyme to catalyze a chemical reaction. Catalysis of adventitious secondary reactions that are physiologically irrelevant is called "catalytic promiscuity." Promiscuous activities are generally orders of magnitude less efficient than well evolved activities because non-physiological substrates do not bind in optimal orientations with respect to catalytic groups. Furthermore, the full array of catalytic groups for effective catalysis of a promiscuous reaction may not be present at an active site that has evolved to serve a different function.
Although promiscuous activities are inefficient, they often accelerate reactions by several orders of magnitude relative to an uncatalyzed reaction. Consequently, they provide good starting places for evolution of new enzymes (4). Fitness can be enhanced by recruitment of a promiscuous enzyme to detoxify a toxin, to use a newly available compound to generate a useful metabolite such as NADPH or an intermediate in an existing metabolic pathway, or to allow the organism to take advantage of a novel source of nitrogen or phosphate. For example, Pseudomonas diminuta, Flavobacterium sp., and Agrobacterium radiobacter use phosphotriesterases to release phosphate from organophosphate insecticides; these enzymes are believed to have evolved in response to the introduction of synthetic insecticides in the 20th century (Ref. 5; see also the accompanying minireview by Elias and Tawfik (50)). Similarly, a number of bacteria have evolved a pathway for degradation of atrazine, a widely used anthropogenic herbicide, to access a novel source of nitrogen (6).
In some cases, a single novel enzyme cannot contribute to fitness, and multiple promiscuous enzymes must be patched together to form pathways that carry out an important function. Simultaneous recruitment of promiscuous activities to form a novel pathway may seem unlikely. However, microbes contain hundreds of enzymes (7), and each probably has a num-ber of promiscuous activities. Thus, there is considerable potential for combining physiological and promiscuous activities into novel sequences. For example, a two-step serendipitous pathway can replace the function of transaldolase in Escherichia coli. Transaldolase is required for generation of erythrose 4-phosphate, a precursor of aromatic amino acids and pyridoxal 5Ј-phosphate (PLP). Unexpectedly, mutants lacking both transaldolase isozymes grow on xylose nearly as well as wild-type cells. In these cells, buildup of sedoheptulose 7-phosphate, the substrate for transaldolase, allows a promiscuous activity of phosphofructokinase to generate sedoheptu-lose 1,7-bisphosphate. (Because k cat /K m for promiscuous activities is very low, substantial conversion of substrates to products requires unusually high levels of substrates.) Sedoheptulose 1,7-bisphosphate is then cleaved to erythrose 4-phosphate and dihydroxyacetone phosphate by a promiscuous activity of fructose-bisphosphate aldolase (8) In this case, emergence of a novel pathway may have been facilitated by recruitment of two enzymes involved in glycolysis. However, such prior associations are not essential. A pathway that restores synthesis of PLP in a strain of E. coli lacking PdxB can be patched together when either yeaB or thrB is overexpressed In this case, the three enzymes had no pre-existing physical, genetic, or functional relationship.
The emergence of novel enzymes from promiscuous activities of specialized enzymes still occurs by gene duplication and divergence of function, but the process occurs in a more complex cellular context. Fig. 1b depicts an adaptation of the widely favored IAD (innovation-amplification-divergence) model (10,11) for evolution of a novel enzyme beginning from a promiscuous activity of a specialized enzyme. Evolution of a new enzyme starting from a promiscuous activity cannot occur unless the novel activity contributes to the fitness of the organism. In some cases, a promiscuous activity may be high enough to enhance fitness when the environment changes. However, many promiscuous activities are too inefficient to affect fitness unless a mutation increases either the level of expression of the gene or the level of the promiscuous activity. Elevation of promiscuous activities to physiological relevance by overproduction of an enzyme has been demonstrated in several recent projects using the ASKA Library (12), which contains every ORF from E. coli. High expression is achieved because the plasmid is present in 20 -30 copies, and expression is controlled by a strong promoter. The substantial overproduction of enzymes achieved in this system allows promiscuous enzymes to replace critical metabolic enzymes (13)(14)(15), confer resistance to an antibiotic (16), or facilitate an alternative route for synthesis of an important metabolite (9,17). A physiological process cannot generate such a high level of overexpression for most genes. Furthermore, some promiscuous activities are too low to be detected even with multicopy expression. For instance, glutamyl-phosphate reductase (ProA) has a very inefficient ability to reduce N-acetylglutamyl phosphate. However, overproduction of ProA does not restore growth of a strain lacking N-acetylglutamyl-phosphate reductase (ArgC) on glu-FIGURE 2. Recruitment of three promiscuous enzymes allows biosynthesis of PLP in a ⌬pdxB strain when either yeaB or thrB is overexpressed. YeaB increases flux into the pathway; ThrB pulls material through the pathway and helps capture glycolaldehyde, which can be lost by diffusion through the membrane.
cose (18). In such cases, mutations that increase the level of the promiscuous activity must occur to raise the activity to the point at which it affects fitness. Mutations in ebg, which encodes an enzyme with low ␤-galactosidase activity, enhance activity by 9 -42-fold and allow cells to grow on lactose, which cannot be utilized by the wild-type strain (19). Similarly, mutations in a gene encoding acetamidase give rise to enzymes that hydrolyze butyramide, phenylacetamide, or valeramide (20).
Once a promiscuous activity crosses the threshold for physiological relevance, selective forces will foster retention of genetic changes that increase the level of the initially inefficient activity. Such increases can be achieved by promoter mutations that increase gene expression, point mutations that increase the level of the new activity, or, most likely, gene duplication. Gene duplication is quite common in bacteria. Indeed, ϳ0.1% of bacteria have a duplication in any given gene even without selection (21). When an activity of the enzyme encoded by the duplicated gene is important for fitness, further amplification occurs readily by recombination between homologous sites in the duplicated region. (The driver for this process can be solely the need for the novel activity or the need for both activities.) At steady state, the number of copies depends upon the size and chromosomal location of the amplified region and the fitness costs and benefits associated with multiple copies of the genes within the amplified region (22).
The presence of multiple copies of a gene increases opportunities to explore sequence space and find mutations that increase the efficiency of the new enzyme. Furthermore, recombination between copies allows rapid exploration of combinations of mutations. Ultimately, when a sufficiently efficient enzyme can be encoded by a single gene, the fitness benefit due to extra copies will be outweighed by the burden of carrying the extra copies, and the extra copies will be lost, leaving two paralogous genes encoding enzymes with different functions.
In both cases depicted in Fig. 1, gene duplication resolves an adaptive trade-off between the two functions of the enzyme and allows each to be optimized. The difference lies in the nature of the ancestral enzyme. In Fig. 1a, the ancestor is a generalist enzyme in which both functions are important for fitness. In Fig. 1b, the ancestor is a specialized enzyme with a promiscuous activity that is initially irrelevant for function. The process by which a promiscuous activity rises to physiological relevance is complex, involving trade-offs between the original and promiscuous activities that are influenced by both the topology of the metabolic network and the environmental conditions.

Trade-offs between the Normal and Novel Functions of a Promiscuous Enzyme
Duplication of a gene encoding an enzyme whose promiscuous activity is far below the level needed to affect fitness will not provide selective pressure for retention of the duplicated gene. In such cases, mutations must increase the level of the promiscuous activity above a critical threshold (or to at least half that required to improve fitness so that gene duplication can push the total level of activity across the line). Increased activity can be provided by either promoter mutations that increase gene expression or mutations in the structural region of the gene that result in an increase in the promiscuous activity.
Because most active sites align the substrate optimally with respect to catalytic groups, mutations that enhance a promiscuous activity might be expected to compromise the original activity. This supposition is supported by many examples of enhancements in promiscuous activities achieved by site-directed mutagenesis (23)(24)(25) and in vitro evolution (26). For example, a point mutation in alanine racemase decreases the original activity by 4 ϫ 10 3 while increasing a promiscuous aldolase activity by 2.3 ϫ 10 5 (24). Surprisingly, however, this is not always true. Khersonsky and Tawfik (27) reviewed 11 cases in which substantial increases in a promiscuous activity (10 -10 6 -fold) were achieved by mutations that caused only a small decrease in the original activity (Ͻ42-fold). Such cases might seem to be especially promising from an evolutionary standpoint, allowing emergence of a new activity while the original activity is maintained. However, the situation is often more complicated. In many circumstances, mutations that diminish the original activity of the enzyme can be tolerated, and indeed, a decrease in the original activity may sometimes be desirable.
Metabolic networks have evolved to be robust in the face of environmental perturbations. As a side effect, this robustness can enable enzyme evolution. The fitness cost of diminishing or even eliminating the activity of an enzyme is often surprisingly modest. Only 80 of the 227 metabolic enzymes involved in glucose metabolism, the TCA cycle, and synthesis of amino acids, nucleotides, and cofactors are essential for growth of E. coli on glucose. Isozymes, alternative enzymes, and broad specificity enzymes can often substitute for a missing enzyme (9). In some cases, alternative routes enabled by interconnecting pathways allow a defective step to be bypassed. Similarly, only 339 of 745 reactions in the Saccharomyces cerevisiae metabolic network are predicted to be active during growth on glucose (28). Of these, 77% are catalyzed by enzymes predicted to be dispensable for growth on glucose. Even more are dispensable in complex media when many synthetic pathways are unnecessary. These predictions were tested by generating 38 knock-out strains lacking predicted nonessential enzymes. Growth rates of the knock-out strains were affected modestly or not at all during growth on glucose and even less during growth on complex medium.
When the environment renders an enzyme dispensable, a substantial decrease in an original activity due to a mutation that enhances a promiscuous activity can be tolerated. A strain of E. coli lacking ␥-glutamylcysteine synthetase (GshA) cannot synthesize glutathione (29) and cannot grow in the presence of arsenate because arsenate reductase requires glutathione as an electron donor. Pseudo-revertants obtained after treatment with N-methyl-NЈ-nitro-N-nitrosoguanidine contained two critical point mutations. One mutation abolishes feedback inhibition of glutamyl kinase (ProB); the mutant ProB synthesizes glutamyl phosphate even when proline is available. The other mutation inactivates glutamyl-phosphate reductase (ProA), the next enzyme in the pathway. In the absence of ProA, glutamyl phosphate formed by ProB reacts with cysteine to form ␥-glutamylcysteine, thus replacing the activity of the missing GshA (Fig. 3a). The mutants are proline auxotrophs. However, in rich medium, ProA is dispensable. Thus, cells can survive when arsenate is present by sacrificing the ability to make proline. Such a sacrifice is not necessarily irreversible. Wild-type proA and proB could be restored during a process of duplication and divergence, as described above, or could be acquired by horizontal gene transfer from another microbe.
Enzymes that are nonessential under certain environmental conditions can only serve as a starting place for evolution of a novel enzyme if they are expressed. Thus, a mutation that causes constitutive expression of the gene may be a prerequisite for recruitment of an enzyme to serve a new function. Such mutations have been documented. Mutants of Enterobacter aerogenes capable of growth on xylitol as a novel carbon source constitutively express ribitol dehydrogenase, for which xylitol is a substrate (30). Mutants of Pseudomonas aeruginosa capable of growth on butyramide arose by alteration of a regulatory protein to permit constitutive expression of an amidase that had poor activity using butyramide as a substrate (31).
Even mutations that severely decrease the activity of an essential enzyme can be tolerated if a novel activity is also essential (18). A strain of E. coli that lacks ArgC (N-acetylglutamyl-phosphate reductase) cannot grow on glucose because it cannot synthesize arginine. Pseudo-revertants obtained after mutagenesis with N-methyl-NЈ-nitro-N-nitrosoguanidine harbored a point mutation in proA that was responsible for restoration of growth. The substrates for ArgC and ProA differ only in the presence of an acetyl group on the substrate for ArgC (Fig. 3). ProA(E383A) has a poor ability to catalyze both reactions. Consequently, the pseudo-revertant has a poor ability to synthesize both proline and arginine. However, the pseudorevertant is more fit during growth on glucose than the ⌬argC strain; it is better to have a poor ability to synthesize proline and arginine than to have a good ability to synthesize proline but no ability to synthesize arginine. Thus, the net effect of a trade-off between the original and novel activities on fitness determines whether a decrease in the original activity can be tolerated.
The previous discussion has emphasized that a decrease in the original activity of an enzyme does not always result in a decrease in fitness. However, a decrease in the original activity of an enzyme may actually be advantageous in some cases. When a single active site is being used to catalyze more than one reaction, each substrate acts as a competitive inhibitor of the other reaction (32). The ratio of flux toward the two products for a hypothetical case involving reactions of two substrates (A and B) is given in Equation 1 (assuming that the enzyme is not saturated with either substrate).
(Eq. 1) If k cat /K m,A is orders of magnitude higher than k cat /K m,B , substantial conversion of B can occur only if the concentration of B is orders of magnitude higher than the concentration of A. Alternatively, a mutation that decreases k cat /K m,A will increase the flux of the reaction involving B. In the example of ProA(E383A) discussed above (18), a decrease in the original activity may have been necessary to allow both substrates access to the active site so that comparable fluxes toward proline and arginine could be achieved.

The Importance of Epistasis
Epistasis refers to the interaction between genetic loci. Epistasis can occur between nucleotides in a gene and between genes in a genome. Epistasis has profound effects on the evolutionary potential of a promiscuous enzyme; all promiscuous enzymes are not equal in terms of evolvability, and all microbes are not equal in terms of evolvability.
Sequence divergence between homologs in different microbes can affect the evolvability of a promiscuous enzyme by affecting the level of the promiscuous activity. Neutral drift, the accumulation of mutations that have no effect on the fitness of the organism, results in substantial changes in promiscuous activities (33). Enzymes with Ͻ30% sequence identity often have similar structures and kinetic parameters but different promiscuous activities. For example, the promiscuous N-acylamino acid racemase activities in o-succinylbenzoate synthases from Amycolatopsis sp., E. coli, and Bacillus subtilis, which have Ͻ29% pairwise sequence identity, vary by Ͼ4 orders of magnitude (34). Neutral drift can also increase protein stability, allowing the protein to accommodate later destabilizing mutations needed to confer a novel property (35,36).
Sequence divergence between homologs also affects evolvability by influencing the availability of an evolutionary trajectory for accumulation of mutations necessary to improve the activity. Many studies have shown that mutations that are beneficial in one context may be detrimental in another (37)(38)(39)(40). A complete exploration of the 120 possible trajectories for accu-mulation of five mutations in TEM ␤-lactamase that together increase resistance to cefotaxime by 100,000-fold (41) revealed that only 18 are accessible due to epistatic effects; four of the five mutations fail to increase fitness in at least some sequence contexts. Such effects should strongly influence the evolvability of promiscuous activities in divergent enzymes. Indeed, some microbes will have the potential to evolve a new enzyme starting from a promiscuous activity, whereas others will not. In such cases, a critical novel enzyme might emerge from a promiscuous activity of a different enzyme, if one is available, leading to convergent evolution of a new enzyme from a different ancestor.
The potential for recruitment of promiscuous enzymes to serve new functions is also affected by intergenic epistasis. As described above, mutations that elevate a promiscuous activity to a physiologically relevant level sometimes compromise the original function of the enzyme. In such cases, microbes with isozymes that render the original activity of an enzyme dispensable have a greater potential for evolution of a novel activity. The topology of the metabolic network is also important, as it affects the potential for construction of alternative pathways that reroute metabolism around a block caused by loss of the original activity of an enzyme.
Epistasis also affects the evolution of novel pathways patched together from promiscuous enzymes. The complement of enzymes is different in different organisms (7), and by extension, the complement of promiscuous activities should also be different. Thus, microbes may patch together different novel pathways because they have different collections of promiscuous enzymes. For example, degradation of 4-nitrotoluene occurs by different pathways in Pseudomonas sp. strain 4NT (42) and Mycobacterium strain HL 4NT-1 (Fig. 4) (43), and degradation of pentachlorophenol occurs by different pathways in the Gram-negative bacterium Sphingobium chlorophenolicum and the Gram-positive bacterium Rhodococcus chlorophenolicus (44,45).

Summary
Divergence of function in duplicated genes has led to the emergence of large superfamilies of enzymes over the billions of years since the LUCA. Early diversification took place as generalist enzymes evolved to generate specialized enzymes; later diversification occurred by recruitment of promiscuous activities of specialized enzymes to serve new functions. The processes by which a promiscuous activity rises to physiological relevance are complex and depend upon the environmental conditions, the collection of available promiscuous activities, the topology of the metabolic network, and the relative importance of the original and novel functions of the enzyme. Evolution of a novel enzyme or pathway may have been possible only in some microbes and only under certain environmental conditions. Genes encoding novel enzymes may have then spread widely by horizontal gene transfer. In some cases, variations in these parameters may have led to convergent evolution of enzymes and/or pathways as microbes took advantage of different collections of promiscuous enzymes to evolve a common function.
Studies of enzyme evolution in the last 2 decades have been primarily protein-centric. This emphasis has arisen from the efforts of protein engineers to evolve efficient enzymes for biotechnology. Structural and mechanistic characterization of enzymes evolved in vitro has greatly enhanced our understanding of how point mutations lead to altered enzymatic properties. The protein-centric approach has also grown out of efforts to understand the emergence of mechanistically divergent superfamilies by mining information in genomic databases (46). The field of systems biology has emerged during the same time period, providing a higher level view of the interactions of proteins, nucleic acids, and small molecules in complex networks. The goal of this minireview was to bring together these two perspectives and to emphasize that understanding the processes by which enzymes evolve in nature requires consideration of the function of enzymes within the context of metabolic and regulatory networks, as well as the environmental conditions in which microbes attempt to survive and reproduce.
Our understanding of the evolution of enzymes in extant proteomes will always be limited by incomplete information about when and in what type of microbe a particular enzyme first emerged, the characteristics of its metabolic network, and the environment that allowed variation in the enzyme to improve the fitness of the microbe. However, we can learn a great deal about enzyme evolution in the context of complex systems by exploiting omic techniques. For example, microbes can be subjected to adaptive evolution under conditions in which their growth is impaired until variants with improved growth emerge. The whole genomes of variant strains can be sequenced so that mutations leading to improved phenotypes can be rapidly identified, and alternative approaches to improved fitness can be discovered (47,48). Transcriptomics, proteomics, and metabolomics can be used to assess changes in the overall system in response to mutations (49). These techniques will allow connections to be made between the effect of mutations on the kinetic characteristics of individual enzymes, changes in fluxes through metabolic pathways, perturbations of the regulatory network, and the resulting phenotype of the microbe: a systems biology perspective on enzyme evolution.