Enzyme Promiscuity: Engine of Evolutionary Innovation*

Catalytic promiscuity and substrate ambiguity are keys to evolvability, which in turn is pivotal to the successful acquisition of novel biological functions. Action on multiple substrates (substrate ambiguity) can be harnessed for performance of functions in the cell that supersede catalysis of a single metabolite. These functions include proofreading, scavenging of nutrients, removal of antimetabolites, balancing of metabolite pools, and establishing system redundancy. In this review, we present examples of enzymes that perform these cellular roles by leveraging substrate ambiguity and then present the structural features that support both specificity and ambiguity. We focus on the phosphatases of the haloalkanoate dehalogenase superfamily and the thioesterases of the hotdog fold superfamily.

In the 1990s, a series of studies on the evolution of catalysis in protein fold families helped define contemporary understanding of enzymes as potentially promiscuous catalysts; the analyses of these enzyme superfamilies suggested that certain folds showed higher variability than expected with regard to the chemistries that can be catalyzed or the substrates that can be acted on (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11). To summarize, the current model holds that enzyme families grow as a result of gene duplication coupled with the acquisition of an advantageous new function. Because the backbone folds, and thus, the catalytic scaffolds are inherited, so is the chemical trait that underlies the intrinsic catalytic functions of all family members. In enzyme families, evidence can be found for low level intrinsic activity associated with one or more extant members, co-existing with the high level of activity unique to the subject enzyme (see for instance, the enolase and alkaline phosphatase enzyme superfamilies (12,13)). The ability to carry out such alternate chemistry is termed catalytic promiscuity. The plausible link between catalytic promiscuity and evolvability has been explored in previous publications (for recent coverage and reviews of this topic, see Refs. 14 -17).
The most commonly encountered observation of promiscuity involves the catalysis of one type of chemistry with many different substrates. Jensen (18) referred to this trait as "substrate ambiguity," and this is the name we will use. Herein, we examine the selective advantage associated with activity toward multiple substrates by highlighting specific examples of enzymes for which the level of substrate ambiguity runs high to fulfill specific roles in the cell. We use as examples enzymes from the haloalkanoate dehalogenase (HAD) 3 superfamily and the thioesterases of the hotdog fold superfamily. In addition, we dissect the architectures of enzymes from these families to discover underlying structural sources of specificity and substrate ambiguity.

Screening to Assess Substrate Ambiguity
In vitro enzyme activity measurements carried out with a structurally diverse library of potential substrates allow one to generate a substrate specificity profile for the enzyme of interest. However, the molecules contained in the chemical library that is used in an in vitro substrate profile analysis to assess enzyme function may never be encountered by the enzyme in the cell, although they might be common metabolites. This may be due to the local concentration of the substrate or the location of the enzyme in the cell with respect to the substrate. In such a scenario, there is a lack of selective pressure to evolve specificity away from the screened substrates. Thus, the observed substrate range can be quite broad. Substrate specificity profiles have been reported for numerous enzyme classes including phosphatases (19 -23), thioesterases (24 -26), peptide racemases (27), and lipases/phospholipases (28). The substrate structures can be surprisingly varied; for instance, single enzymes can utilize long and short-chain acyl-CoA substrates (24), apolar and anionic lipids (28), or various length phosphosugars and nucleotides (20).
Although a number of substrates may yield activity, there is often a somewhat narrow substrate range. For instance, as part of a recent bioengineering effort, 62 putative thioesterases were functionally screened by in vivo fatty acid titers (25). When compared with native levels or overexpression of endogenous thioesterases, the introduced enzymes showed increased production of saturated, unsaturated, and branched short-chain fatty acids. From the in vivo results, six enzymes were screened in vitro against a broad range of acyl-CoAs to determine their substrate preferences. The results suggest that most thioesterases exhibit substrate ambiguity. However, they show greater activity toward acyl-CoAs downstream in their respective metabolic pathways than for precursor acyl-CoAs. This slight narrowing of specificity range is advantageous as it prevents depletion of upstream acyl-CoAs in the same path as the subject enzyme.
Broader specificity against different classes of substrates (sugars and nucleotides) was observed in a genome-wide study, in which substrate specificities of all 23 soluble Escherichia coli HAD superfamily members were characterized in vitro against a set of 80 representative phosphorylated metabolites (19). The results demonstrate that HAD superfamily members possess the capacity to hydrolyze a wide range of phosphorylated compounds such as sugars, nucleotides, organic acids, coenzymes, and small phosphodonors. The enzymes assessed primarily act as phosphatases with overlapping but non-identical substrate profiles. Notably, the phylogenetic tree of the HAD superfamily proteins computed from their sequences shows no congruence to the functional relationships exhibited by them in vitro, suggesting a lack of direct relationship between homology and function in the HAD superfamily.
Orthogonal to the genome-wide study discussed above, Bastard et al. (30) demonstrated the use of an integrated strategy for exploring the functional diversity of a previously undescribed enzyme family, DUF849 (see Fig. 1 for general reaction and representative substrates). Based on the activity of the family member Kce, they propose a general chemical reaction (condensation of a ␤-keto acid with acetyl-CoA to produce a CoA ester and acetoacetate) potentially catalyzed by the family and a list of possible substrates. Then, they select, produce, and screen a set of representative enzymes against all substrates in vitro in a high throughput manner. Out of 322 representative enzymes, they were able to overexpress 124 proteins and test their activities against 17 substrates. Although the type of chemistry observed was not varied, their results showed that overall 20% of the enzymes in the family (25 out of 124 screened) catalyze reactions on five or more substrates and that two subfamilies displayed substrate ambiguity where the size and steric and electrostatic nature of the substrate varied considerably to include a range of aliphatic, polar, and cationic groups, all with moderate efficiency (k cat /K m ϳ10 4 M Ϫ1 s Ϫ1 ). Analysis for specificity determinants showed that in one of the subfamilies (denoted G1) showing marked substrate ambiguity, a small cap domain covers the active site that interacts with the varied substrates (see "Architectures for Tunable Substrate Specificity").

Promiscuity as an Advantage
The impact of substrate ambiguity at the cellular level has been estimated using a genome-scale metabolic network model for analysis of E. coli metabolism to involve ϳ37% of metabolic enzymes, which in turn catalyze at least 65% of the metabolic reactions (31). Functionally, substrate ambiguity can provide benefits to the cell via a number of mechanisms: 1) proofreading, 2) scavenging of nutrients, 3) removal of antimetabolites, 3) balancing of metabolite pools, and 4) establishing system redundancy. Illustrations of some of these functions from the thioesterase and HAD superfamilies are described below, and the structural basis is discussed and elaborated in the following sections.

Proofreading
A classic proofreading function is provided by a member of the thioesterase family, EntH. Organisms such as E. coli synthesize the siderophore enterobactin to harvest iron from the environment (32). Enterobactin is synthesized from 2,3-dihydroxybenzene (2,3-DHB) and L-serine through the iterative action of the nonribosomal peptide synthetases EntB and EntF. The Schematic of the mechanism of each superfamily is depicted (left) together with representative substrates (right). In the case of the HAD superfamily, two chemistries (C-P and O-P bond cleavage reactions) are shown. Note that phosphonoacetaldehyde hydrolase is a specific enzyme; thus. only its primary substrate is depicted. phosphopantetheinyl transferase EntD catalyzes the transfer of pantetheine phosphate from CoA to the Ser residue of the 2,3-DHB carrier domain of EntB and of the peptidyl carrier domain of EntF to generate the two holo-enzymes. The bifunctional EntE catalyzes the adenylation of 2,3-DHB and the subsequent aroyl transfer to the pantetheine thiol of holo-EntB. Substrate ambiguity of EntD and EntE can derail enterobactin biosynthesis by leading to the formation of misacylated holo-EntB by two means: 1) when CoA is limited, endogenous acyl-CoA may substitute as a substrate for EntD (33), and 2) if 2,3-DHB is limited, a carboxylate metabolite may substitute as a substrate for EntE (34) (Fig. 1).
Although lack of specificity causes this problem, substrate ambiguity can rescue the pathway by enabling a proofreading function for EntH. EntH can prevent EntD-catalyzed mischarging of EntB through hydrolysis of endogenous acyl-or aroyl-CoAs (e.g. numerous aromatic hydroxylated benzoyl-CoAs and hydroxylated phenylacetyl-CoA are substrates with k cat /K m Ͼ 1 ϫ 10 4 M Ϫ1 s Ϫ1 ). However, it can also release the holo-EntB from misacylated holo-EntB by recognition and hydrolysis of the acyl/aroyl unit of the EntB thioester substrate (k cat /K m ϳ1.5 ϫ 10 5 M Ϫ1 s Ϫ1 ) ( Fig. 1) (33). As discussed under "Architectures for Tunable Substrate Specificity," the mechanism that allows the broad substrate range of thioesterases such as EntH is the open active-site pocket with affinity provided by the pantetheine moiety of CoA.

Removal of Antimetabolites
Catabolic 5Ј-nucleotidases catalyze the hydrolysis of nucleoside monophosphates, and thus, play a major role in regulating the free nucleotide levels in living cells. With differences in substrate specificity, localization, oligomerization state, and size, seven different 5Ј-nucleotidases have been characterized in humans. Among these, the HAD superfamily member cytosolic 5Ј-nucleotidase III-like protein (cN-IIIB) from humans and Drosophila has been recently described and has been shown to have broad activity against multiple substrates (35). This enzyme efficiently dephosphorylates CMP, UMP, GMP, AMP, and notably, the modified purine 7-methylguanosine monophosphate (m 7 GMP, Fig. 1) (k cat /K m ϭ 8.7 ϫ 10 4 and 6.5 ϫ 10 4 M Ϫ1 s Ϫ1 for CMP and m 7 GMP, respectively). During the degradation of all eukaryotic mRNAs, 7-methylguanosine nucleotides are generated from the mRNA cap structure. This can lead to accumulation of m 7 GMP in the cytosol and, if left at such high concentrations, can subsequently be incorporated into nucleic acids. The broad substrate preference of cN-IIIB reduces the concentrations of cytosolic m 7 GMP, thereby avoiding its undesirable buildup in the cell.
Housekeepers are also used in bacteria for the protection of cells from the adverse effects of non-endogenous metabolites. For example, YjjG, a nucleotide phosphatase from E. coli, protects the organism against non-canonical pyrimidine derivatives (36). Using in vitro activity assays, YjjG has been shown to dephosphorylate a wide range of potentially mutagenic nucleotides such as 5-fluoro-2Ј-deoxyuridine, 5-fluorouridine, 5-fluoroorotic acid, 5-fluorouracil, 5-aza-2Ј-deoxycytidine, and 5-bromo-2Ј-deoxyuridine, thus preventing their incorporation into DNA and RNA (Fig. 1). The housekeeper cN-IIIB exempli-fies the mechanism of substrate ambiguity allowed by the binding determinants provided by domain insertion (see "Architectures for Tunable Substrate Specificity"). YjjG is likely to acquire substrate ambiguity via a mechanism similar to that of cN-IIIB (although no structure of YjjG is available, it is homologous (29% identity) to BT2271 (Protein Data Bank (PDB) ID 3QNM), which has a similar domain insert to cN-IIIB).

Balancing Metabolite Pools
An example of such a "balancing act" is the relaxed substrate specificity observed for YciA, which carries out hydrolysis of cellular acyl-CoA thioesters. The broad observed substrate range (k cat /K m ϭ 1.1 ϫ 10 7 and 1.4 ϫ 10 6 M Ϫ1 s Ϫ1 for isobutyryl-CoA and lauroyl-CoA, respectively), together with the gene context, suggests a role for YciA and its orthologs in recycling CoA and balancing fatty acyl-CoA pools for membrane remodeling. As in the thioesterase EntH, subsequent YciA structure determination revealed that substrate recognition is directed at the thioester pantetheine moiety and not the acyl moiety (37), thus accounting for the observed promiscuity, as well as its tight regulation by strong CoA feedback inhibition (24) (Fig. 1).

Mechanisms of Specificity and Promiscuity
Several molecular mechanisms of achieving catalytic promiscuity and substrate ambiguity have been proposed and reviewed (38 -40). Among these, the functional plasticity of popular folds (i.e. superfolds) is most intimately tied to evolution. Functional diversity within superfamilies follows a distribution where some superfamilies display significant diversity, whereas most superfamilies do not (41). It has been suggested that superfolds can accommodate a large number of sequences because of their inherent stability and tolerance to extensive mutations (42). Also, the superfolds corresponding to these functionally diverse superfamilies tend to have binding clefts in a common location, with a significant propensity to bind similar substrates despite no other clear indication of a common ancestor (43). Thus, protein families and superfamilies with common binding sites (by definition designated as a superfold) are prone to be functionally diverse and possess the potential to show substrate ambiguity.
A recent study by Tawfik and colleagues (44) looks into the tradeoff between the fold stability and ability to take on new functions via informatics and extends the study experimentally to TEM-1 ␤-lactamase. A major finding is that enzymes are more amenable to evolution when the active site is composed of flexible loops juxtaposed, but separated from, a highly ordered core scaffold. Families capable of supporting multiple functions are expected to have a lower percentage of residues comprising the integral part of the protein scaffold. Thus, the core fold can be elaborated by swapping out residues through evolution to act on new substrates.
Another overarching finding has been the association of conformational flexibility with substrate ambiguity (45) and acquisition of new specificities (46). In recent studies, it has been shown that the flexibility of active-site loops leads to differential conformational sampling and active site reshaping, allowing alternative substrates to be utilized (47,48). The examination of a designed aldolase has further indicated that backbone flexibility would enable exploration of potential sites for introducing catalytic residues (49). A conformational ensemble approach has been used to describe promiscuous enzymes as lowering the energy barriers to conformational rearrangements needed for substrate ambiguity (40, 50).
As described above, the intrinsic chemical activities are indicative of the common origin of a superfamily. The more tailored activities may arise after a gene duplication event, leaving catalytic hints of their origin that can be observed in their substrate specificity profile. A recent study by Daughtry et al. (51) analyzed the evolution of two bacterial proteins from the HAD superfamily implicated in the lipopolysaccharide biosynthetic pathways, namely 2-keto-3-deoxy-D-manno-octulosonate-8-phosphate (KDO8P) phosphohydrolase and 2-keto-3deoxy-9-O-phosphonononic acid (KDN9P) phosphohydrolase (Fig.  2). Each enzyme was observed to have activity against both substrates, but to be more active toward the biological substrate (k cat /K m ϳ10 4 and 10 2 M Ϫ1 s Ϫ1 , respectively, for native substrate versus the other). The fact that KDO8P phosphohydrolase possesses promiscuous activity toward KDN9P and that it has a comparatively broad biological range suggests it as the KDN9P phosphohydrolase ancestor. The acquisition of a Glu/ Lys pair that facilitated KDN9P binding and the corresponding advantage associated with sustaining an expanded substrate range accelerated the evolution of the KDN9P phosphohydrolase lineage. In support of the idea of an enzyme successfully maintaining multiple functions above baseline catalysis, it was found that although the deletion of this amino acid pair in KDN9P phosphohydrolase removed activity toward KDN9P, the enzyme still retained full KDO8P hydrolase activity. This is a classic example of how a native, unused activity can be finetuned under evolutionary pressure and become fixed in the population.
The presence of common binding sites within superfolds can give rise to novel chemistry within a superfamily. For example, most members of the HAD superfamily catalyze hydrolysis of phosphate monoesters (i.e. P-O bond cleavage), through covalent catalysis by a nucleophilic Asp residue. Members have extended the existing binding site to gain new functionality, yielding phosphonate hydrolase activity (i.e. C-P bond cleavage, Fig. 1)) (52). In phosphonoacetaldehyde hydrolases, the addition of a Lys residue (contributed by a domain insertion near the active site (schematic in Fig. 3, top)) results in the formation of a Schiff base, which provides the electron sink for catalysis of C-P bond-cleavage.
As in this example and in the example above of the DUF849 family, a potential mechanism for expanding the functional repertoire of a superfold is via incorporation of additional domains, which may or may not exist independent of the primordial fold. Structurally, the HAD superfamily consists of the conserved active site-bearing Rossmann fold (called the core domain) embellished with a variety of inserted domains (commonly referred to as cap domains). In some cases, these cap domain insertions have expanded the substrate range of HAD superfamily enzymes by providing extensive locations that can contribute toward substrate binding energy. For example, a HAD superfamily member from E. coli, NagD, shows multiple substrate use across pathways (20). The genome context of NagD within the nagBACD operon implicates it in the hydrolysis of N-acetylglucosamine 6-phosphate or glucosamine 6-phosphate, intermediates in the biosynthesis of N-acetylglucosamine, a critical component of cell wall assembly. In vitro NagD dephosphorylates glucosamine 6-phosphate with moderate efficiency (k cat /K m ϭ 10 3 M Ϫ1 s Ϫ1 ) but not N-acetylglucosamine 6-phosphate (Fig. 1). An alternative role is implied by the occurrence of several NagD homologs as fusions between NagD and MutT domains. Typically, MutT proteins (members of the Nudix enzyme family) convert 8-oxo-dGTP to 8-oxo-dGMP to block its incorporation into DNA. Thus, the NagD domain in a NagD-MutT fusion protein may perform nucleotide monophosphate hydrolysis. A focused biochemical screen shows a broad substrate preference for NagD toward UMP, CMP, GMP, AMP (k cat /K m ϭ from 2 ϫ 10 3 to 3 ϫ 10 4 M Ϫ1 s Ϫ1 ), and other sugar phosphate metabolites. Although a substrate-bound structure is not available, by homology to other HAD superfamily phosphatases, it is likely that in NagD, the open nature of the active site together with the loop provided by the cap domain allow for a wide sampling of substrates for dephosphorylation. In another example from the HAD superfamily, differential interactions between the cap and substrate can result in activity against several substrates (schematic in Fig. 3). Human cytosolic 5Ј-nucleotidase II (cN-II) participates in the regulation of purine nucleotide pools by catalyzing the dephosphorylation of 5Ј-nucleotide monophosphates. cN-II is allosterically activated by several phosphate-containing metabolites and is known to interfere with the phosphorylation-dependent activation of some nucleoside analogs used in cancer treatments, thus displaying broad in vivo activity. In x-ray crystal structures of cN-II with either dGMP versus UMP, different interactions between cap and substrate are present. In the cN-II-UMP complex, Arg-202, Asp-206, and His-209 form hydrogen bonds with the substrate. In addition, Tyr-210 makes van der Waals interactions with the ribose moiety. In the cN-II-dGMP complex, Tyr-210 moves out of the active site to accommodate the larger base and His-209 no longer forms a hydrogen bond with the deoxyribose (53).

Architectures for Tunable Substrate Specificity
Enzyme superfamilies known to contain one or more members that display substrate ambiguity are becoming commonplace. In this final section, we examine the architectures of two superfamilies for which substrate ambiguity is an inherent trait, namely the HAD superfamily and the thioesterases of the hotdog fold superfamily. Additionally, these enzymes exemplify two different ways to promote promiscuity.
The hotdog fold thioesterase superfamily is a functionally diverse family of evolutionarily related proteins, which share a common ␣ ϩ ␤-fold (54). The core tertiary structure consists of a 5-turn ␣-helix cradled by a curved, 5-7-stranded antiparallel ␤-sheet. The functional unit is a dimer, with the subunit interface joining the two sheets into a continuous ␤-sheet (55). The two active sites are located at opposite ends of the interfaced sheets. The vast majority of the family members target thioester substrates containing CoA or holo-ACP (acyl carrier protein modified with a phosphopantetheine group). The thioesterases form the largest functional subfamily and will therefore be the focus of our discussion. A narrow channel, which leads from the protein surface to the site of catalysis, binds the phosphopantetheine moiety using desolvation and weak electrostatic forces to augment the two hydrogen bonds derived from protein backbone amide groups (Fig. 4) (37, 55, 56). These nonspe-   , ligand in yellow) show the open active-site cavity at the acyl group, which is a major determinant of promiscuity. In B, van der Waals surfaces (protein core in gray, protein cap in blue, ligand in yellow) show the shape complementary between the substrate and the active site, restricting access of bulk solvent to the nucleophile and cofactor. In C, binding residues provided by the cap domain insertion (blue ribbon) allow alternate binding modes for two different substrates. This provides a fertile seedbed for the acquisition of new substrates. cific interactions ensure that tight binding of the phosphopantetheine moiety is preserved despite the divergence in sequence in this region. The adenosine-3Ј,5Ј-diphosphate unit of the CoA thioester and the ACP domain of the holo-ACP thioester are accommodated at the protein surface, which varies to promote promiscuity toward these two classes of thioester substrates (56). The site of thioester cleavage is located at the juncture of the phosphopantetheine channel and the alkyl/arylbinding site. In the thioesterases displaying a very restricted substrate range, the alkyl/aryl-binding site is an enclosed pocket (for example, fluoroacetyl-CoA (57, 58) and 4-hydroxybenzoyl-CoA thioesterases (54,55,59)). In contrast, for thioesterases displaying a modest (60) or expansive substrate range, like hTHEM2 and hTHEM4 (61,62) and E. coli YciA (24,37), the pocket is partially open or leads to solvent, respectively. Notably, the fact that catalytic efficiency and substrate ambiguity can co-exist is exemplified by the thioesterase YciA where k cat /K m values range from 3 ϫ 10 5 to 8 ϫ 10 7 M Ϫ1 s Ϫ1 for CoA thioester substrates as diverse as acetyl-CoA, phenylacetyl-CoA, and myristoyl-CoA (24).
A popular mechanism for expanding the functional diversity in a protein family is via domain fusions. In ϳ10% of observed fusion events, domains are inserted into an existing fold, leading to a discontiguous domain (49). Such an insertion architecture is observed in the HAD superfamily of phosphotransferases where the conserved Rossmann fold core domain is tethered to a dynamic cap domain in ϳ85% of its members (with the exception of the integral membrane ATPases) (63). The catalytic site, located in a shallow depression on the surface of the core domain, binds the substrate phosphate group and the Mg 2ϩ cofactor. The association with the inserted cap domain (see "Mechanisms of Specificity and Promiscuity" above) is stabilized through interaction with the substrate leaving group, and substrate-induced cap closure is linked to catalytic turnover (64 -67). The structural mechanisms for specificity and substrate ambiguity can be broken into three types: 1) restriction of substrate shape, 2) interdomain movement, and 3) multiple sites of enzyme-substrate interaction.
The cap domain insertion point and insert length lead to a natural classification of the superfamily into cap types C0 (minimal cap) and C1 and C2 (large caps) (68). Comparison of different enzymatic functions (using enzyme commission numbers) associated with the HAD superfamily suggests significant overlap in functions being catalyzed by members across different cap domain classes (43). A functional similarity network inferred from the functional data shows that HAD superfamily members with no or minimal insert (cap type C0) lie at the center of the network, implicating C0 as the primordial HAD superfamily member. Consistent with this concept and with few exceptions, the C0 members characterized to date show a narrow substrate range. An example of such a C0 HAD superfamily member is D-glycero-D-manno-heptose-1,7-bisphosphate phosphatase (GmhB) in the S-layer glycoprotein and lipid A biosynthetic pathways. GmhB hydrolyzes only the C-7 phosphoryl group (not C-1) of D-glycero-D-manno-heptose 1,7-bisphosphate (Fig. 1), and the E. coli enzyme discriminates between the ␤and ␣-anomers of the sugar by 2 orders of magnitude (k cat /K m ϭ 7 ϫ 10 6 M Ϫ1 s Ϫ1 and 7 ϫ 10 4 M Ϫ1 s Ϫ1 , respec-tively) (44,45). Similarly, histidinol phosphate phosphatase (hisB), Scp1, and Dullard have a narrow substrate range wherein they dephosphorylate histidinol and phosphoserine peptides in an efficient manner (k cat /K m ϭ 4 ϫ 10 8 , 2 ϫ 10 4 and 3 ϫ 10 4 M Ϫ1 s Ϫ1 , respectively) (46 -48). The lack of broad substrate range in C0 members may be attributed to the fact that in addition to binding interactions between enzyme and substrate, the major determinant of specificity is that only a molecule that fills the active-site cavity to occlude the catalytic center (nucleophilic Asp and Mg 2ϩ cofactor) from bulk solvent can act as substrate (Fig. 4). A priori, this architecture limits the chemical space occupied by possible substrates. In the case of peptidic substrates, the peptide sequence space is narrowed to gain this shape complementarity. That is, the preferred substrates of Scp1 and Dullard include a Pro residue in the substrate two residues from the phosphoserine. The structure of the Scp1-peptide complex shows that the bound peptide makes a kink imparting near exact shape complementarity, and thus, specific binding.
Domain insertions also produce the possibility of variable conformations between domains. In the HAD superfamily, one example demonstrates how fixing the interdomain distance can lead to specificity by controlling the active-site volume of a phosphatase. The HAD superfamily member BT2127 from Bacteroides thetaiotaomicron is similar to ␤-phosphoglucomutase in sequence (42% similar), overall fold, and characteristic active-site residues. Surprisingly, in vitro substrate activity screening reveals that BT2127 is neither a mutase nor an organophosphate hydrolase but rather an inorganic pyrophosphatase (k cat /K m 1.6 ϫ 10 4 M Ϫ1 s Ϫ1 ) (29). Structural analysis of BT2127 reveals that in contrast to ␤-phosphoglucomutase, BT2127 does not take on an opened cap conformation. Strikingly, Glu-47 from the cap of BT2127 coordinates to the Mg 2ϩ ion in the core domain, promoting a closed conformation (43). Thus, substrate discrimination in this case is based on activesite size restrictions imposed by the cap domain. As discussed above, conformational variability is one of the structural principles underlying the evolution of new specificities.
Intuitively, domain inserts can lead to high specificity as is observed in the HAD superfamily family member phosphoserine phosphatase (19). However, numerous C1 and C2 members show substrate ambiguity with a broad substrate range. Thus, such large domain insertions can act as a fertile seedbed for engineering new activities via substrate ambiguity. As described above, in HAD superfamily members with large insertions, such as cN-II (Fig. 3), the insertions have expanded the substrate range of HAD superfamily enzymes by providing extensive locations that can contribute binding energy (19 -22). The lax substrate discrimination can be attributed to the multiple positions that exist at the cap surface for substrate binding groups, as well as attributed to the use of desolvation of nonpolar surfaces and diffuse -interactions with aromatic sides as the principle sources of substrate-binding energy. This allows us to speculate that the mechanistic potential to catalyze different chemistries is inherent in the Rossmann fold and that the evolutionary potential for functional expansion is inherent in the accessorizing domain. The design principle is an extreme example of that put forth by Tawfik and colleagues (44) (and discussed above) wherein the cap domain donates loops that are completely separate from those of the core scaffold.
Structure-function relationships that delineate specificity versus substrate ambiguity have yet to be fully explored. The analysis of activity against multiple substrates by enzymes on a superfamily-wide scale through approaches such as high throughput screening should yield valuable data for the identification of specificity and promiscuity determinants. In addition to providing evolutionary insight, promiscuous enzymes may yield good starting material for protein engineering and synthetic biology approaches.