Reconstructed ancestral enzymes reveal that negative selection drove the evolution of substrate specificity in ADP-dependent kinases

One central goal in molecular evolution is to pinpoint the mechanisms and evolutionary forces that cause an enzyme to change its substrate specificity; however, these processes remain largely unexplored. Using the glycolytic ADP-dependent kinases of archaea, including the orders Thermococcales, Methanosarcinales, and Methanococcales, as a model and employing an approach involving paleoenzymology, evolutionary statistics, and protein structural analysis, we could track changes in substrate specificity during ADP-dependent kinase evolution along with the structural determinants of these changes. To do so, we studied five key resurrected ancestral enzymes as well as their extant counterparts. We found that a major shift in function from a bifunctional ancestor that could phosphorylate either glucose or fructose 6-phosphate (fructose-6-P) as a substrate to a fructose 6-P-specific enzyme was started by a single amino acid substitution resulting in negative selection with a ground-state mode against glucose and a subsequent 1,600-fold change in specificity of the ancestral protein. This change rendered the residual phosphorylation of glucose a promiscuous and physiologically irrelevant activity, highlighting how promiscuity may be an evolutionary vestige of ancestral enzyme activities, which have been eliminated over time. We also could reconstruct the evolutionary history of substrate utilization by using an evolutionary model of discrete binary characters, indicating that substrate uses can be discretely lost or acquired during enzyme evolution. These findings exemplify how negative selection and subtle enzyme changes can lead to major evolutionary shifts in function, which can subsequently generate important adaptive advantages, for example, in improving glycolytic efficiency in Thermococcales.

Deciphering the biochemical and genetic mechanisms by which new protein functions appear is a central goal in molecular evolution (1). In enzyme families, functionally diverse members share a common mechanism of catalysis (2), but have structural determinants that dictate different specificities and therefore different functions in metabolic pathways. Many observations support the hypothesis that different functions or specificities evolve through either promiscuous activities or bifunctionality, although promiscuity has also been proposed as an evolutionary vestige of ancestral function (3). In this respect, it is important to distinguish between promiscuity and bifunctionality. The former results from a secondary catalytic activity that is not physiologically relevant, whereas bifunctionality refers to the simultaneous existence of two catalytic activities both of which have physiological relevance (4). On the other hand, the mechanism and evolutionary modes whereby a bifunctional enzyme for substrates "A" and "B" becomes specific for "A" have been poorly explored. In principle, the specialization for "A" can be either the result of a positive selection for "A" or a negative selection against "B." Directed evolution experiments indicate that negative selection induces selectivity much more effectively (5) and two modes of substrate discrimination have been proposed: ground-state (loss of substrate affinity) and transition-state (loss of catalysis) and that such discrimination imposes an accuracy rate tradeoff (6). Furthermore, experimental microbial evolution has shown that new functionalities that enable adaptation often involve a small number of genetic alterations, in which a single change causing a large effect on the phenotype or fitness tends to appear first, followed by subsequent mutations that generate small enhancing effects (7). Nonetheless, a question that remains open is whether this mechanism could also act at the level of substrate specificity in enzymes.
Traditional biochemical studies regarding functional specificity in extant homologous enzymes often focus on the identification of key amino acid changes that would account for different functions in different families (8). However, during evolution most changes are expected to have a small or null effect on enzyme specificity, making the identification of truly critical residues a difficult problem to solve. This difficulty can be overcome if one considers the evolutionary history of extant members by employing ancestral sequence reconstruction (ASR) 5 (8). The availability of current methodologies for the reconstruction of ancestral sequences with low uncertainty, and specifically those of key ancestors responsible for the appearance of new functions, enable us to explore the mechanisms and relevant changes that gave rise to the functions of current proteins (9).
In this work we study ADP-dependent kinases, which are key enzymes in the glycolytic pathway of Euryarchaeotas in archaea. The members of this family have only two types of catalytic activity; as phosphofructokinases (PFK) and glucokinases (GK), both of which participate in fundamental steps in the glycolytic pathway. In archaea, these enzymes have been described mainly in the orders Thermococcales, Methanosarcinales, and Methanococcales as well as in higher eukaryotes (10). According to the Kyoto Encyclopedia of Genes and Genomes (62), 6 the genomes of Thermococcales have two copies of these genes (glkA and pfkC), which have been characterized experimentally as glucokinases (ADP-GK), highly specific for glucose (11), and phosphofructokinases (ADP-PFK), highly specific for fructose-6-P (12). Methanosarcinales genomes also have two genes pfkC and glkA, coding ADP-K1 and ADP-K2, respectively, but thus far there are no reports concerning the specificities of these enzymes. By contrast, the Methanococcales genomes have only one gene (pfkC) and the corresponding enzyme has been characterized to be bifunctional, suggesting that it satisfies both functions in vivo (13,14). These bifunctional enzymes have been described to be phylogenetically closest to the specific PFKs from Thermococcales (15) (Fig. 1).
Several structures of specific ADP-dependent kinases have been reported. Among them there are the specific ADP-GK from Thermococcus litoralis, Pyrococcus horikoshii, Pyrococcus furiosus, and mouse (PDB codes 1GC5, 1L2L, 1UA4, and 5CK7, respectively) and the specific ADP-PFK from P. horikoshii (PDB code 1U2X).
Inspection of the active site of the ADP-PFK from P. horikoshii highlights the importance of Ala-71 in substrate specificity, since replacement of this residue for glutamic acid enables the catalysis with Glc as substrate (16). However, to date it is not known if the ancestral trait was bifunctionality or fructose-6-Pspecific. Moreover, whether the change from bifunctional to specific (or the opposite) was acquired through positive or negative selection and whether it involved a ground-state or transition-state mechanism are still unanswered questions.
We studied five key ancestral enzymes of increasing historical age from the ADP-dependent kinase family to compare their activities with contemporary enzymes. This allows us to explore the mechanism underlying the change of protein func-tion during natural evolution, illuminating the evolutionary basis for substrate recognition. Also, we were able to reconstruct the evolutionary history of the substrate-binding affinity using a simple evolutionary model of discrete binary characters and also to identify the large-effect mutation that is the structural determinant responsible for the emergence of fructose-6-P specificity in the ADP-PFKs enzymes from Thermococcales.

Phylogeny of ADP-dependent enzymes and ancestral sequence reconstruction
We reconstructed the phylogeny of the ADP-dependent kinase family and performed ancestral protein resurrection to characterize the substrate specificity of key ancestors. The phylogeny of the ADP-dependent kinases was inferred using Bayesian inference on the basis of amino acid and nucleotide sequences ( Fig. 1 and supplemental Fig. S1). Furthermore, a maximum likelihood phylogeny based on amino acid sequences was also constructed for comparison (supplemental Fig. S2). All the methodologies employed resulted in the same general topology for the phylogeny (Fig. 1A), which is consistent with previous studies (14,15,17). Six groups can be identified: PFK from Thermococcales, GKs from Thermococcales, and GKs from Eukaryotas, bifunctional enzymes PFK/GK from Methanococcales as well as homologous proteins from the order Methanosarcinales here denominated ADP-K1 and ADP-K2. All of these groups, as well as the key nodes employed to perform the inference of ancestral sequences, present robust statistical support (supplemental Figs. S1 and S2). Five ancestral enzymes from the family were selected for ancestral inference and reconstruction. Each of them is located at a major branch point and positioned progressively deeper within the phylogeny; these ancestors are indicated in Fig. 1A and their corresponding sequences are shown in supplemental Fig. S3. AncMT was considered particularly interesting because it is the ancestor of both the bifunctional enzymes from Methanococcales and the specific fructose-6-P enzymes from Thermococcales. Characterization of this ancestor should prove whether bifunctionality or fructose-6-P specificity is the ancestral trait.
According to a variety of protein markers (18) and the tree time published for archaea (19) the divergence of Methanococcales from Thermococcales occurred about 3594 million years ago, whereas that of Methanococcales from Methanosarcinales occurred later, about 3468 million years ago (Fig. 1B). This inconsistency between the tree topology of the ADP-dependent enzymes and the phylogeny described for archaeal organisms can be explained by horizontal gene transfer (HGT) events. The appearance of the Thermococcales PFKs close to the bifunctional Methanococcales enzymes is the result of an HGT event that occurred relatively recently (despite the fact that Thermococcales organisms were the first to diverge). Such an event would explain the phylogenetic closeness between theses enzymes and allow us to hypothesize that ancMT was a bifunctional enzyme that gave rise to the fructose-6-P-specific enzymes present today in Thermococcales. Another HGT event could have occurred from Thermococcales toward Methanosarcinales explaining the presence of two genes in the latter group. 5 The abbreviations used are: ASR, ancestral sequence reconstruction; PFK, phosphofructokinase; GK, glucokinase; Glc, glucose; HGT, horizontal gene transfer; TlGK, glucokinase from T. litoralis; PhPFK, phosphofructokinase from P. horikoshii; r.m.s., root mean square. 6 Please note that the JBC is not responsible for the long-term archiving and maintenance of this site or any other third party hosted site.

Negative selection in the evolution of ADP-dependent kinases
However, in this case it is difficult to be certain given the lack of biochemical information concerning enzymes from the Methanosarcinales. Considering these likely HGT events it can be hypothesized that ancestors ancGK/PFK and ancMMT existed about 3594 and 3468 million years ago, respectively (19). Robustness analysis of the five reconstructed ancestral sequences showed that the probability distribution for ancGK/ PFK resulted in 54% of its residues with a posterior probability greater than 0.5. By comparison, the analogous posterior probability values for ancMMT, ancMT, ancT, and ancM were 86, 95, 96, and 95%, respectively (supplemental Fig. S4). According, with the objective of elucidating the evolutionary history of substrate usage, which led to the existence of extant bifunctional and fructose-6-P-specific enzymes, we also analyzed the robustness of residue positions that reside within or near the active site. All positions within 12 Å of the sugar were examined (supplemental Table S1). Indeed, 12 Å is a generously set threshold because it incorporates all positions that make up the active site including catalytic residues and their second-shell neighbors. The analysis shows that residues of the first-shell (out to 5 Å from the substrate) and second-shell (from 5 to 12 Å) have a high robustness in all the reconstructed sequences except for ancGK/PFK. The latter was analyzed using three different methodologies; maximum likelihood using protein sequences (20) and a Bayesian approach employing either protein or gene sequences (21), revealing that the reconstructed first-shell residues are not dependent on the method employed (supplemental Table S2). Although this ancestor shows a relatively weak robustness, we decided to analyze it because it represents the deepest node in the phylogeny and might give light about the evolutionary trajectory if it is considered along with the results obtained for the other ancestors, which have very robust statistics support. In the case of ancMMT, ancMT, ancT, and ancM, high robustness was observed for both the first and second shell residues, supporting the notion that the kinetic characteristics of the enzymes are not affected by uncertainties in the reconstructed sequences.

Kinetic characterization of extant and ancestral enzymes
In the order Thermococcales the extant GK and PFK ADPdependent kinases are highly specific for their substrates (Table  1). We have evaluated the GK from T. litoralis (TlGK), which has no detectable activity with fructose-6-P, and the PFK from P. horikoshii (PhPFK), which has no activity with glucose. In addition, TlGK and PhPFK were not significantly inhibited by fructose-6-P or glucose, respectively, even at concentrations of 50 mM, which emphasizes that these enzymes are neither able to bind nor perform catalysis on the inappropriate substrate. On the other hand, bifunctionality is present in both the Methanococcus jannaschii (MjPFK/GK) ( Table 1) (14) and Methanococcus maripaludis enzymes (13). To gain insight into changes in substrate specificity during the evolutionary history of this family we characterized the enzyme kinetics of the resurrected ancestral enzymes employing both potential substrates (Table  1). Because ancT saturation with glucose was not reached, only rough estimates for k cat and K m were obtained (see "Experimental procedures"), to have at least an approximate order of magnitude for these values. The results show that the ancestral family trait is bifunctionality, whereas the trait of fructose-6-P specificity in PFK from Thermococcales is the evolutionary novelty. Fructose-6-P specificity emerged in the transitions between ancMT and ancT ( Fig. 2A), whereas bifunctionality is retained in ancM (the last common ancestor of Methanococcales) (Fig. 2B). It is important to note that in the ancMT to ancT transition the enzyme suffers a large loss in glucose affinity (K m 31 mM to Ͼ1 M) without a corresponding change in catalysis (k cat 4 to ϳ15 s Ϫ1 ) ( Fig. 2A). Concomitantly, there appears an increase in fructose-6-P affinity and catalysis. This analysis leads to two possibilities: (i) a mixture of two mechanisms acting in series with negative selection via a ground-state discrimination mode (loss of affinity for glucose) and then a positive selection increasing both the catalytic rate and affinity for fructose-6-P or (ii) the loss of affinity for glucose is a side-product of

Negative selection in the evolution of ADP-dependent kinases
an increase in catalytic efficiency for fructose-6-P occurring only by positive selection. Another important result is that even though ancT is specific for fructose-6-P, it is also able to phosphorylate glucose at very high concentrations (K m Ͼ 1 M). This means that the GK activity has become a promiscuous activity and a vestige of its evolutionary past.
Continuing backwards in the evolutionary history of the use of glucose as a substrate it becomes clear that the affinity for glucose dates back to ancGK/PFK. As we mentioned before, this ancestor may raise some concerns about its reconstruction robustness. However, the kinetic parameters obtained for this ancestor fit perfectly well with the evolutive trajectory described by the other ancestors and then we included it in the analysis. From this ancestor onwards the K m for glucose increased up to ancMT, as did that for fructose-6-P. However, this decrease in affinity is accompanied by an increase in k cat from ancGK/PFK to ancMT for both substrates (Fig. 2). On the other hand, during the evolutionary trajectory from ancGK/ PFK to ancMT the efficiency constant (k cat /K m ) shows a slight decrease for fructose-6-P, whereas that for glucose decreases slightly between ancGK/PFK and ancMMT and then increases from ancMMT to ancMT, achieving approximately the same efficiency as for fructose-6-P (Fig. 2). However, after the dramatic loss in glucose affinity observed during the ancMT to ancT transition ( Fig. 2A), the kinetic parameters for fructose-6-P, k cat , and k cat /K m increased, reaching the same order of magnitude as those reported for the current PFK enzyme from P. horikoshii (PhPFK) ( Fig. 2A).
Interestingly, the transition from ancMT to ancT correlates with the possible event of horizontal gene transfer from Metha- Table 1 Kinetic

constants of extant and ancestral enzymes
The mean Ϯ S.E. were calculated from three independent measurements.

Fructose-6-P Glucose
Ϫ2c a ND, no activity detected. b Estimated from double-reciprocal plot. c Determined from initial slope of saturation curve.

Figure 2. Trace of the kinetic parameters of PFKs specificity en route to Thermococcales (A) and
Methanococcales (B) using glucose or fructose-6-P as substrates. Divergence times were estimated by the RelTime method implemented in Mega7, using the speciation events shown in Fig. 1B (ancGK/PFK 3.59 Ga and ancMMT 3.47 Ga) as calibration points. The estimated times were: ancMT 1.98, ancM 1.51, and ancT 0.93 Ga. Evolution of K m and k cat shows that the main change toward fructose-6-P specificity in Thermococcales occurred in the transition between ancestors ancMT (1.98 Ga) and ancT (0.93 Ga), and was mainly caused by a loss in glucose affinity together with a concomitant increase in fructose-6-P affinity, leading to a 6 ϫ 10 4 -fold increment in fructose-6-P specificity.

Negative selection in the evolution of ADP-dependent kinases
nococcales to the Thermococcales organism. This HGT event would have provoked a GK redundancy in the ancestor of Thermococcales, giving freedom for ancMT to specialize in fructose-6-P catalysis. On the other hand, during the evolutive pathway from ancMT to ancM and then to extant Methanococcales enzymes (Fig. 2B), bifunctionality was maintained. The k cat val-ues of ancM and MjPFK/GK enzymes for both substrates are very similar, whereas the specificity constant (k cat /K m ) of the ancestor in respect to the extant enzyme differ only by 18 or 2 times to fructose-6-P or glucose, respectively (Fig. 2B).

Structural bases for loss of glucose affinity
We determined the crystal structure of ancMT to shed light on the structural determinants involved in the dramatic loss of glucose affinity, which occurred en route to ancT. The ancestor was crystallized and the structure was determined to a resolution of 2.6 Å. The data collection and refinement statistics are given in Table 2. The structure shows that, like contemporary ADP-dependent enzymes, the ancestor is a monomer with a ribokinase-like fold (22). The overall structure shows the same organization seen in extant ADP-dependent kinases; the large domain has a Rossmann-fold (␣/␤/␣) architecture with a central 10-stranded ␤-sheet surrounded by 12 ␣-helices and three 3 10 helices and the small domain consists of seven ␤-strands and four ␣-helices. The asymmetric unit contains two monomers, chains A and B, which have open and semiclosed conformations, respectively (Fig. 3A). These two conformations probably imply that the kinetically relevant ligand-induced conformational changes reported for other members of the family are also applicable here (17,23). Although, ancMT crystallized in the presence of fructose-6-P and the product AMP, electron density was observed only for the latter (Fig. 3C). In this case, the phosphate group of AMP appears in two different positions, perhaps due to the lack of interactions resulting from the absence of the ␤-phosphate of ADP.
To identify the key residues responsible for sugar binding we modeled the ancestors ancMT, ancT, and ancM in the closed conformation in complex with Mg-ADP, using glucokinase from Pyrococcus furiosus (PDB code 1UA4) (23) and the crystal structure of ancMT reported here as templates. Docking assays with fructose-6-P and glucose were performed, which were subjected to an energy minimization protocol to allow for the

Negative selection in the evolution of ADP-dependent kinases
proper accommodation of the substrates (Fig. 4). Minimized structures show that the ancMT and ancM ancestors bound both substrates practically identically. In ancT, fructose-6-P is stabilized in a manner similar to that observed for the previous two ancestors involving several hydrogen bonds through its hydroxyl groups and a salt bridge between the phosphate moiety and Arg-198. However, the binding of glucose to ancT is stabilized by only three hydrogen bonds mediated by its hydroxyl groups, compared with interactions involving all such groups in the ancMT and ancM ancestors. The almost null affinity of ancT for glucose (K m Ͼ 1 M) can be explained by the absence of these interactions, which is due to the presence of residue Ala-76 instead of a glutamic acid (Glu-72 in ancMT and Glu-79 in ancM). This glutamic acid receives a hydrogen bond from the C1 hydroxyl of glucose but does not interact directly with fructose-6-P. It may also play a second role by stabilizing Arg-192 (in ancMT) in an appropriate conformation for its interaction with glucose. This is expected to be unnecessary in the case of fructose-6-P because of the formal charge on the phosphate moiety (Fig. 4). It is noteworthy that this single change is the main conserved difference found between the sugar-binding sites of bifunctional and specific ancestors. Moreover, reconstruction of ancestral proteins using gene sequences shows that only one base change in the second position of the codon is required to mutate from glutamic acid (GAA and GAG) to alanine (GCC, GCT, GCA, and GCG) (supplemental Table S3).
To evaluate the dynamics of protein ligand interactions, we performed a molecular dynamics simulation. The ancMT ancestor was simulated in complex with each sugar substrate and Mg-ADP to analyze the Van der Waals plus electrostatic interactions (nonbonding interaction energy) between active site residues and fructose-6-P or glucose. Molecular dynamics trajectories show that the sugar-binding site of ancMT displays mainly attractive interactions with fructose-6-P where Lys-159 and Arg-192 display the strongest interactions and that with Glu-72 the only repulsive one (Fig. 5). In stark contrast, the latter represents one of the strongest attractive interaction energies when glucose is the substrate (Fig. 5), whereas Arg-192 has a significant repulsive contribution. Although Arg-192 in docking assays was shown to be able to interact with glucose by a hydrogen bond, the molecular dynamics showed that over time the interaction is mostly repulsive. These results point to the fact that residue Glu-72 is essential for glucose, but not for fructose-6-P, binding. Therefore, it seems reasonable to assume that the E72A mutation alone is responsible for the transition from bifunctionality to fructose-6-P specificity, resulting in an evolutionary novelty in the ADP-dependent kinase family. Although this replacement has no major effect on the protein structure or on the ligand-binding site itself, it leads to an alteration of the hydrogen-bonding network that determines ligand binding and, as a consequence, to the overall energy landscape of these interactions. It is important to note that these results are based on theoretical homology modeling, docking, and classic molecular dynamics, so experimental evidence should be added to support these results.

Recreating the evolutionary pathway toward fructose-6-P specificity
Because Glu-72 has been pointed out to be the main factor responsible for glucose binding to the ancMT ancestral enzyme, it was a logical next step to generate a mutant (E72A) of ancMT, and to determine its structure ( Table 2) and kinetic properties. The structure of the mutant protein, crystallized in the presence of fructose-6-P and AMP, presents no significant structural modifications beyond the mutation itself, presenting

Negative selection in the evolution of ADP-dependent kinases
an r.m.s. deviation on C␣ atoms of 0.29 Å when compared with the wild type (supplemental Fig. S5). As in the case of the wildtype enzyme, the structure of the E72A mutant also shows an open and a semiclosed conformation for the two molecules in the asymmetric unit. Kinetic characterization of the E72A mutant shows no significant changes with respect to the kinetic parameters for fructose-6-P. Mainly, the catalytic efficiency decreases almost 2.5 times, implying that the increase in selectivity imposes a slight accuracy rate tradeoff. However, although the mutation did not affect significantly the kinetic parameters for fructose-6-P (Table 1) the mutation leads to a huge loss of glucokinase activity. Nonetheless, enzyme activity can be observed at very high glucose concentrations, exhibiting a behavior similar to that displayed by the ancT ancestor (Fig.  6). The linear portion of the initial velocity curve versus glucose concentration for the E72A mutant allowed us to determine a catalytic efficiency (k cat /K m ) of 8.1 ϫ 10 Ϫ2 M Ϫ1 s Ϫ1 (see "Experimental procedures"), a value that is 1,600 times lower than that observed for the wild-type ancMT (1.3 ϫ 10 2 M Ϫ1 s Ϫ1 ). A rough estimation of the k cat and glucose K m values from a doublereciprocal plot, showed a minor change in k cat , which dropped from 4.1 to ϳ0.3 s Ϫ1 , whereas the K m increased from 31 mM to over 1 M, showing that the loss of glucokinase activity is mainly due to a dramatic loss in glucose affinity caused by the E72A mutation, in agreement with the structural analysis described above (Figs. 4 and 5). However, the k cat of the E72A mutant is still lower than that of ancT indicating that additional changes en route from ancMT to ancT must have led to an improved catalytic rate. Taken together, the data suggest a mixture of negative and positive selection mechanisms in which the two effects are uncoupled. Feasibly, glucose affinity loss (negative selection with ground-state discrimination) occurred prior to the increase in affinity for fructose-6-P and the rate of catalysis in general (positive selection), which discards the loss of affinity for glucose as a side product of the increase in catalytic effi-ciency toward fructose-6-P during the transition from ancMT to ancT.

Evolutionary history of enzyme substrate affinity reconstructed as a discrete trait
Because fructose-6-P specificity is caused by a single mutation,theevolutionaryhistoryofsubstrateaffinitycouldbereconstructed using a simple evolutionary model such as the Markov k-state 1 parameter model (Mk1) of discrete binary characters (24). We performed the analysis using this model as implemented in the Mesquite software employing the maximum likelihood approach. We used the consensus tree obtained  . Glucose saturation curves for ancMT and its E72A mutant. Saturation curve for ancMT was fitted to the substrate inhibition equation. For the E72A mutant saturation was not observed even at glucose concentrations higher than 600 mM. Inset, activity of the E72A mutant with glucose as substrate in a different velocity scale. The activity shows a linear dependence on glucose concentration and the k cat /K m value was obtained from the slope, considering that glucose concentrations are below the K m value. Error bars represent S.E. from three independent measurements.

Negative selection in the evolution of ADP-dependent kinases
from the Bayesian phylogeny reconstruction based on protein sequences (Fig. 1A) together with the binary character defined as the ability to bind (or not) either substrate, at concentrations considered physiological (Ͻ50 mM). Specific GK and PFK activities of the Thermococcales group were assigned based on extant experimentally characterized enzymes, whereas Methanococcales enzymes were assigned as bifunctional according to that reported in the literature. In the Methanosarcinales group, there is no experimentally characterized enzyme to date. However, protein sequence alignment shows that the active site residues of the bifunctional PFK/GKs from Methanococcales are also conserved in ADP-K1s from Methanosarcinales, whereas those of the GKs from Thermococcales are conserved in ADP-K2s, suggesting that the former should be classified as bifunctional and the latter glucose specific. The data obtained from this analysis (Fig. 7) showed that the ancMT ancestor has a probability of 98.9% to bind glucose and 99.9% to bind fructose-6-P, in total agreement with the experimental bifunctional trait obtained from the kinetic characterization of this ancestor. This analysis also perfectly reconstructs the use of substrates by other ancestors; ancT (Fru-6-P specific), ancM (bifunctional), and ancMMT (bifunctional). However, ancGK/PFK showed a probability of 99.9% to bind glucose and a probability of only 10% for binding of fructose-6-P. In this case, the discrete binary character model presents a discrepancy with respect to the experimental characterization, but it should be borne in mind that this ancestor also presents the lowest posterior probabilities in the ASR procedure. To analyze the robustness of the reconstruction based on the maximum likelihood of discrete characters, we also employed Bayesian inference with BayesTrait version 2.0 (25) using one rate of change (q10 ϭ q01). This analysis showed essentially the same results as those obtained by maximum likelihood, with the only difference being the value of the posterior probability of ancGK/PFK for fructose-6-P, whose mean value was 0.44 (supplemental Table S4). These analyses show the uncertainty in the inference of the use of fructose-6-P in this ancestor, which can be attributed to the node-depth. In any case, the analysis of discrete traits is in full agreement with the robust posterior probabilities obtained for the reconstruction of the ancestor using ASR and with the experimental data obtained for the characterized ancestors.

Discussion
Directed evolution experiments performed with enzymes presenting promiscuous activities have provided important hints regarding evolutionary, structural, and mechanistic relationships within enzyme superfamilies (3). However, changes in function along natural lineages employing ASR have been less thoroughly explored. Here we present such a study in which we focus on enzyme substrate specificity in a family of key enzymes involved in glycolytic metabolism in archaea.
The ASR field is still developing and there is still no overall agreement about optimal reconstruction methods. Sometimes many sequence positions have alternative predictions, all with relatively high probabilities, in which case the posterior proba-

Negative selection in the evolution of ADP-dependent kinases
bility difference between the first and the second most likely amino acid at that position can be considered a measure of robustness. Alternatively, a robust reconstruction may be considered to be one that was independent of the method used. Regarding the sequence reconstructions presented in this study, ancMMT, ancM, and mainly ancMT and ancT showed a high level of robustness (supplemental Table S1) providing confidence that the experimental results obtained for them represent a genuine description of these ancestors. On the other hand, ancGK/PFK although showing robust results for residues that form the first shell of contacts to the sugar substrate showed uncertainties mainly within the second shell (supplemental Table S2), which may be relevant for enzymatic activity (26). Nevertheless, experimental results on ancGK/PFK are consistent with both the other ancestors and extant enzymes.
By characterizing ancestral enzymes we have been able to show that the fructose-6-P-specific enzymes (PFKs) found in Thermococcales originated from a PFK/GK bifunctional ancestor. Hence, the bifunctional activity of enzymes from the extant Methanococcales represents the preservation of this ancestral trait to the present day, whereas the evolutionary novelty is the appearance of fructose-6-P specificity that emerges during the transition from ancMT to ancT. The resolution of the crystallographic structure of ancMT together with molecular modeling and molecular dynamics of the ancestral proteins showed that a single mutation (E72A) at the active site of ancMT is the principal continuant responsible for the loss in glucose affinity. It is evident that discrete mutations, in this case just one, can cause large changes in enzyme function, which may affect the fitness of an organism and thus provide certain metabolic advantages in particular ecological environments. A similar situation, where a shift in function was driven primarily by only two amino acid changes, was found in the study by Harms et al. (27) on ligand specificity in the steroid receptors. Furthermore, Heilbron et al. (28) using a hypermutator strain of the bacterium Pseudomonas aeruginosa showed that rare mutations of large effect have a strong influence on fitness changes.
The topology of the tree describing ADP-dependent kinase evolution, and particularly regarding why PFKs and GKs from Thermococcales are not phylogenetically close, can be better understood looking at the change in substrate specificity observed between the ancMT and ancT ancestors. It has been previously suggested (15) that HGT of an ADP-dependent gene from a Methanococcales to a Thermococcales occurred at some point during their evolutionary history. Redundancy of ancMT GK activity in Thermococcales (which already have a glkA gene) would have given the required freedom for a bifunctional gene to specialize toward Fru-6-P specificity, presumably starting the mutational pathway via the E72A mutation. This would explain, on the one hand, the presence of specific PFKs in Thermococcales and on the other, the distant phylogenetic relationship of these genes with the GKs from the same group. Nonetheless, there is no direct evidence of HGT from Methanococcales to Thermococcales, either by organization of neighboring genes or codon usage, which may be due to the long time elapsed because these events occurred. However, there is evidence that HGT has occurred on other occasions (15) within the ADP-dependent kinase family; the gene of ADP-K2 from Methanothrix thermophile (Methanosarcinal) has a large difference in codon usage compared with its own genome, but presents a great similarity with that of GKs from Thermococcales. Furthermore, evidence shows the importance and high rate of HGT events occurring during the evolution of prokaryotes in general (29).
The kinetic parameters of the resurrected enzymes are consistent with those for extant enzymes, showing an increase in catalytic efficiency on evolving toward the current forms. This is in overall agreement with other studies of ancestral enzyme reconstruction (30), although there are examples in which ancestral enzymes show greater efficiency than their extant counterparts (31). Both cases agree with current theories for the evolution of enzyme catalysis (32) and recently reviewed by Newton et al. (33), which propose that changes in the kinetic parameters depend on the function and metabolic context in which the enzyme works. Considering the above, ADP-dependent kinases from Methanococcales are bifunctional enzymes with low catalytic efficiencies, in accordance with the fact that these organisms obtain their energy mainly from methanogenesis, whereas glycolysis/gluconeogenesis participates in glycogen metabolism, which could be an environmental advantage when carbon sources are scarce (34). On the other hand, in Thermococcales, glycolysis would be a major source of energy, in accordance with the presence of ADP-dependent specific enzymes with higher catalytic efficiencies.
The emergence of a specialist enzyme (ancT) from a bifunctional one (ancMT) by negative selection is due to the loss of glucose affinity resulting from the E72A mutation (groundstate discrimination mode). It is likely that the loss of glucokinase activity by the E72A mutation provided the required freedom for the appearance of subsequent mutations that increased both the affinity and catalytic rate when using fructose-6-P as substrate. From these results it is not possible to deduce if the specialist ancT was fixed in the population by genetic drift or positive selection. However, the latter would make more sense if we consider a positive selective pressure toward more efficient glycolysis in these organisms leading to an increase in catalysis and affinity for fructose-6-P after the appearance of the E72A mutation.
Our findings are also consistent with studies of experimental microbial evolution, which reveal that a modest number of mutations have a great effect on the phenotype or fitness (7), shedding light on the classic debate about the size distribution of the mutational effects. Here we present a clear example where the evolutionary shift in function was driven by only a single amino acid substitution through negative selection, which caused a 1600-fold change in the ancestral specificity of the protein by a ground-state discrimination mode. This in turn, could provide the freedom for the specialization and increased efficiency of the enzyme and possibly provide an adaptive advantage through a gain in glycolytic efficiency in Thermococcales. Although there is agreement regarding the role of promiscuity as a starting point for the appearance of new protein function, there have been few insights into the mechanisms and mutational paths that lie behind the process of functional divergence. Here we were able to reconstruct the evolutionary history of substrate affinity using a simple evolutionary Negative selection in the evolution of ADP-dependent kinases model of discrete binary characters and identify a large-effect mutation responsible for a critical event that is characteristic in the transformation of a specialized enzyme from a broad specificity progenitor. This allows substrate usage under physiological conditions in the family of ADP-dependent kinases to be considered as traits that are lost or acquired in a discreet manner, which can be considered as a punctuated equilibrium mechanism operating at the enzymatic function level.

Alignment and inference of phylogeny
The Protein BLAST Server was used to extract sequences present in the non-redundant protein sequence database (nr) using a PSI-BLAST algorithm with 3 iterations and ADP-GK from T. litoralis as template (UNIPROT codes of corresponding sequences used are shown in supplemental Table S5). A multiple sequence alignment was constructed based on three-dimensional and secondary structure constraints using Promals3D (35), and then misaligned positions were corrected by visual inspection in Multiseq from VMD (36). To obtain the multiple sequence alignment of the gene sequences, the protein sequence alignment was fixed and backtranslated into DNA using the genes of each protein using the RevTrans server (37). The Bayesian inference of phylogeny was performed with MrBayes version 3.1.2 (38). For the protein sequences we used Cprevϩ IϩG as a fixed rate model, which showed a posterior probability of 1.0. For the gene sequences, a GTRϩIϩG model was used according to JMoldelTest (39). For Bayesian inference, the number of generations was set to 1 ϫ 10 6 , sample frequency to 100, with 2 runs and 4 chains for each run, and a temperature parameter of 0.2. The average standard deviation of the split frequencies was less than 1 ϫ 10 Ϫ4 . The maximum likelihood inference was performed with PhyML 3.0, using protein sequences with a CprevϩIϩG model.

Ancestral sequence reconstruction
For the Bayesian approach we used a modified method described by Hall (21) using gene sequences. Ancestral sequences were inferred through a hierarchical Bayesian approach implemented in MrBayes 3.1.2 with model and parameters used for phylogeny inference (40). Each target node was constrained and the probabilities for each nucleotide (or amino acid) were calculated at each position of the alignment, where the nucleotide (or amino acid) with highest posterior probability was selected. The maximum likelihood ASR inference was performed with PALM 4.4 (41), using the Lazarus tool according to that described by Hanson-Smith et al. (42). Sequence gaps were inferred using a model of binary evolutionary traits with one rate of change, considering only the presence of gaps or characters in the alignment (21). The positions that were most likely to be a gap for each ancestor were eliminated from the sequence, which produced ancestral sequences with different residue numbers (e.g. Glu-72 in ancMT is found at the same alignment position as Glu-79 in ancM and Ala-76 in ancT).

Gene synthesis, protein expression, and purification
The protein sequences for the ancestors inferred by the hierarchical Bayesian approach using gene sequences were codonoptimized for expression in Escherichia coli and genes were synthesized by GENSCRIPT (Piscataway, NJ), then cloned into a modified pET-28b vector and verified by DNA sequencing. E. coli BL21(DE3) cells were transformed and grown in LB broth containing 35 g/ml of kanamycin at 37°C until the A 600 reached ϳ0.8. Expression of the recombinant proteins was induced with 1 mM isopropyl ␤-D-thiogalactopyranoside overnight. Cells were harvested by centrifugation, re-suspended in binding buffer (50 mM Tris-HCl, pH 7.6, 500 mM NaCl, 20 mM imidazole, and 5 mM MgCl 2 ), and disrupted by sonication. After centrifugation (18,514 ϫ g for 45 min), ancestors ancMT, ancM, and ancT were heated at 70°C during 10 min, chilled on ice, and centrifuged at 18,514 ϫ g for 45 min before loaded onto a nickel-nitrilotriacetic acid affinity column (HisTrap HP, GE Healthcare, UK). Proteins were eluted with a linear gradient between 20 and 500 mM imidazole and fractions presenting enzyme activity were pooled. After the nickel-nitrilotriacetic acid affinity column, ancMMT and ancGK/PFK were dialyzed against buffer 25 mM Tris-HCl, pH 7.8, and 50 mM NaCl and chromatographed in an anionic exchange column HiTrapQ (GE Healthcare, UK). Purified proteins were stored at 4°C in the presence of 5% glycerol. Enzyme purity was analyzed by SDS-PAGE stained with Coomassie Blue. Before crystallization, ancMT and its mutant were dialyzed against 25 mM Hepes/NaOH, pH 7.8, 5 mM MgCl 2 , concentrated by filtration and supplemented with ligand.

Reagent concentrations
Glucose and fructose-6-P were titrated with ADP-dependent glucokinase from T. litoralis and ADP-dependent PFK from P. horikoshii, respectively. ADP was titrated with the GK assay using ADP-GK from T. litoralis. Glucose-6-P was titrated with glucose-6-P-dehydrogenase from Leuconostoc mesenteroides and NAD ϩ under conditions in which the co-substrate concentration was at least 20-fold over the compound titrated. The protein concentration was determined using the Bio-Rad protein assay (Bio-Rad) with bovine serum albumin (BSA) standard (Thermo Scientific) as reference.

Determination of enzyme activity and kinetic parameters
PFK activity was assayed spectrophotometrically as described previously (43). The enzyme preparation was mixed with reaction buffer containing 10 mM ADP, 15 mM MgCl 2 , 25 mM PIPES, pH 6.5, 0.2 mM NADH and variable concentrations of fructose-6-P. ␣-Glycerophosphate dehydrogenase, triosephosphate isomerase, and aldolase from rabbit (purchased from Sigma) were used as auxiliary enzymes.
GK activity was measured spectrophotometrically by following NAD ϩ reduction at 340 nm coupled with glucose-6-P oxidation. The assay was carried out in the presence of 10 mM ADP, 15 mM MgCl 2 , 25 mM Hepes, pH 7.8, 0.5 mM NAD ϩ and glucose. Glucose-6-P dehydrogenase from L. mesenteroides was used as an auxiliary enzyme.
All enzyme activities were determined at 40°C using a photodiode array spectrophotometer (Hewlett Packard/Agilent Negative selection in the evolution of ADP-dependent kinases 8453, Santa Clara, CA). An extinction coefficient of 6.22 mM Ϫ1 cm Ϫ1 for NADH was used for specific activity (U/mg) determination (U ϭ mol/min). The data were analyzed using nonlinear regression with GraphPad Prism version 5.0 (GraphPad Software Inc., San Diego, CA), and fitted to the Michaelis-Menten or substrate inhibition equations.
In the event that saturation was not reached, the k cat /K m value was determined from the initial linear portion of the saturation curve. In the Michaelis-Menten equation at low substrate concentrations ([S] Ͻ Ͻ K m ) the velocity exhibits a linear dependence on substrate concentration whose slope is (k cat / K m ) ϫ [E], so the k cat /K m value can be obtained by dividing the slope by the enzyme concentration. A rough estimation of the k cat and K m values for glucose of the ancT and E72A mutant were obtained by double-reciprocal plot.

Crystallization, data collection, and structure determination
Crystallization screening was performed using a Honey-bee963 robot (genomic Solution) in Greiner CrystalQuick plates using a 1:1 ratio of protein to precipitant. Optimization rounds were carried out manually by the hanging drop method resulting in good quality crystals of ancMT and its E72A mutant. These were obtained for both proteins using a drop of 2 l of protein stock solution (8 mg/ml of protein in 25 mM Hepes/NaOH, pH 7.8, buffer, 30 mM MgCl 2 , 20 mM fructose-6-P, and 20 mM AMP) plus 2 l of the reservoir solution comprising 18% PEG 3350 and 0.15 M NaI. Crystals appeared after 3 days at 18°C, the crystals were briefly transferred to glycerol as cryoprotective solution and flash-frozen in liquid nitrogen. The X-ray diffraction data were collected at the Diamond Light Source using the beamline I04-1. The data were indexed, integrated, and scaled with XDS (44), and merged using Aimless from the CCP4 package (45). The structure of the ancMT in complex with AMP was solved by molecular replacement using the program Phaser (46), with the PFK structure from P. horikoshii split into two domains (major and minor) as searches models (PDB code 3DRW). The structure of the E72A mutant was solved using the structure of ancMT as the search model. Structure refinement was carried out using Phenix (47) and COOT for model building (48). The AMP ligands were automatically placed using COOT, and water molecules were located with a combination of Phenix and COOT. The quality of the model was evaluated using Molprobity (49). Full statistics for data collection and refinement are summarized in Table 2.

Molecular modeling, docking, and molecular dynamics
For homology modeling two template structures were employed. One was the GK from P. furiosus in its closed conformation (PDB code 1UA4), which presents AMP and glucose in its active site. The second template was the structure for ancMT solved here. In the latter case the structure was first divided into its two domains, which were then structurally aligned independently to their homologues in PDB code 1UA4. These two structures were then used as templates for modeling the enzyme in the closed conformation. Fifty models were constructed with MODELLER 8 (50). From these, the best 10 models based on the DOPE potential were chosen, and their quality evaluated with PROSA2003 (51), Procheck (52), and VERIFY_3D (53).
Docking assays with glucose and fructose-6-P were performed with AutodockVina 1.0 (54), with protonation states according to the pH values used in the kinetic experiments. The protonation state of the ionizable residues was calculated using the web server Hϩϩ (55). After this procedure, partial charges were derived with the Gasteiger method using AutoDockTool (56). Docking results with low interaction energies and the phosphoryl acceptor hydroxyl oriented toward the GXGD motif were selected since the aspartic acid is considered the catalytic base for the phosphate group transfer in all the Ribokinase superfamily members (57). Subsequently, the Mg-ADP complex was included into the resulting docked structures according to the procedure described by Merino et al. (58) and then both structures were subjected to energy minimization using a standard protocol in NAMD 2.7 (59) with the CHARMM27 force field. Proteins were placed in a water box of TIP3 waters that extended at least 12 Å beyond any macromolecular atom. The system was neutralized with NaCl to a final concentration of 0.1 M and 10,000 iterations were performed without restrictions. The equilibration of the system was performed in an isobaric isothermal assembly (NPT) at a temperature of 310 K, 1 atmosphere of pressure (1.01325 bar), PME cutoff of 12.0, Langevin piston pressure control, Langevin temperature control, and using a time step of 2 fs. After 10 ns of equilibration a 30-ns long simulation was produced for each system and the nonbonding energy between protein and ligand was analyzed using the NAMDenergy plugin of VMD (60).

Reconstruction of ancestral discrete traits
The reconstruction of ancestral states as discrete traits was performed by maximum likelihood using the Mesquite 2.75 software. To discriminate between the MK1 and AsymmMK models, the likelihood ratio test was used. For all traits evaluated the LR was not higher than the p parameter of the 2 distribution with 1 degree of freedom, so the MK1 model was selected. The consensus tree obtained from the Bayesian phylogeny reconstruction using protein sequences was used for analysis.
Bayesian inference was performed using the software Bayes-Traits (61) using the MCMC methodology. Traits evaluated were mapped onto 500 trees obtained using Bayesian inference in MrBayes 3.1 with protein sequences. The Bayes factor was used to discriminate between a 1 (q01 ϭ q10) or 2 rate (q01 q10) model, the former being most appropriate. Analyses were performed with 10 million iterations, with burn-in of 100,000 and a sampling frequency of 1,000. The ratedev parameter was set so that the acceptance parameter was maintained between 20 and 40%, whereas the sequence of Homo sapiens was used as out-group to root the trees. Parameter files and Bayes factor calculations were evaluated with the Tracer software.