Understanding Enzyme Superfamilies

Prior to the discovery in 1990 that mandelate racemase (MR) and muconate-lactonizing enzyme (MLE) are structurally similar enzymes that catalyze different overall reactions (1), structurally related enzymes were assumed to catalyze identical chemical reactions but, perhaps, with distinct substrate specificities. For example, all of the members of the serine protease superfamily were known to catalyze the same chemistry, hydrolyses of peptide bonds, although their peptide substrates varied. As described by Craik and Perona in the previous minireview (2), evolutionary accommodation of these differences in substrate specificity can result in major reorganization of the associated structures. In this minireview, we discuss four recently discovered enzyme superfamilies in which an alternate theme predominates; within each of these superfamilies, the member proteins share a common structural scaffold but catalyze different overall reactions. For each of the superfamilies described, the active sites are contained within a single homologous domain. Although they represent several distinct family folds, the enzyme functions in each superfamily are related to their respective structural scaffolds in the same way; the proteins within each superfamily utilize a common mechanistic strategy for lowering the free energies of the rate-limiting transition states in the reactions they catalyze. The existence of several examples of such superfamilies lends further credence to the principle that the evolution of new catalytic activities involves the incorporation of new catalytic groups within an active site while retaining those groups necessary to catalyze the partial reaction common to all of them (3–5). As a consequence, the range of catalytic functions that can be accommodated by a single structural scaffold is considerably broader than had been previously suspected. Further, the diversity of function that each superfamily represents allows an economy in the number of unique protein folds required to support life and, as a result, undoubtedly has “simplified” the course of metabolic evolution. The Enolase Superfamily: Abstraction of the a-Protons of Carboxylic Acids We recently described the enolase superfamily, the members of which catalyze at least 11 different chemical reactions, including racemization, epimerization, and both syn and anti b-elimination reactions involving water, ammonia, or an intramolecular carboxylate group as leaving group (5). Despite broad differences in substrate structures and the overall reactions they catalyze, all of the reactions of the enolase superfamily are initiated by a common partial reaction, metal-assisted, general base-catalyzed abstraction of the aproton of a carboxylate anion to generate a stabilized enolate anion intermediate (Reaction 1). However, the fate of the intermediate (protonation in the case of racemization and epimerization reactions and vinylogous b-elimination in the others) must be determined by the different functional groups that “surround” the intermediate in the active site. The common partial reaction is thermodynamically difficult: the pKa values of the a-protons in the substrates range from 29 to 32 whereas the pKa values of the conjugate acids of the active site bases accepting the protons are #7. The enzyme-active sites must destabilize the enzyme-substrate complex and/or stabilize the enzyme-intermediate complex so that the free energy of the transition state for a-proton abstraction can be lowered sufficiently to be consistent with the observed kcat values.

Prior to the discovery in 1990 that mandelate racemase (MR) 1 and muconate-lactonizing enzyme (MLE) are structurally similar enzymes that catalyze different overall reactions (1), structurally related enzymes were assumed to catalyze identical chemical reactions but, perhaps, with distinct substrate specificities. For example, all of the members of the serine protease superfamily were known to catalyze the same chemistry, hydrolyses of peptide bonds, although their peptide substrates varied. As described by Craik and Perona in the previous minireview (2), evolutionary accommodation of these differences in substrate specificity can result in major reorganization of the associated structures.
In this minireview, we discuss four recently discovered enzyme superfamilies in which an alternate theme predominates; within each of these superfamilies, the member proteins share a common structural scaffold but catalyze different overall reactions. For each of the superfamilies described, the active sites are contained within a single homologous domain. Although they represent several distinct family folds, the enzyme functions in each superfamily are related to their respective structural scaffolds in the same way; the proteins within each superfamily utilize a common mechanistic strategy for lowering the free energies of the rate-limiting transition states in the reactions they catalyze. The existence of several examples of such superfamilies lends further credence to the principle that the evolution of new catalytic activities involves the incorporation of new catalytic groups within an active site while retaining those groups necessary to catalyze the partial reaction common to all of them (3)(4)(5). As a consequence, the range of catalytic functions that can be accommodated by a single structural scaffold is considerably broader than had been previously suspected. Further, the diversity of function that each superfamily represents allows an economy in the number of unique protein folds required to support life and, as a result, undoubtedly has "simplified" the course of metabolic evolution.

The Enolase Superfamily: Abstraction of the ␣-Protons of Carboxylic Acids
We recently described the enolase superfamily, the members of which catalyze at least 11 different chemical reactions, including racemization, epimerization, and both syn and anti ␤-elimination reactions involving water, ammonia, or an intramolecular carboxylate group as leaving group (5).
Despite broad differences in substrate structures and the overall reactions they catalyze, all of the reactions of the enolase superfamily are initiated by a common partial reaction, metal-assisted, general base-catalyzed abstraction of the ␣proton of a carboxylate anion to generate a stabilized enolate anion intermediate (Reaction 1). However, the fate of the intermediate (protonation in the case of racemization and epimerization reactions and vinylogous ␤-elimination in the others) must be determined by the different functional groups that "surround" the intermediate in the active site. The common partial reaction is thermodynamically difficult: the pK a values of the ␣-protons in the substrates range from 29 to 32 whereas the pK a values of the conjugate acids of the active site bases accepting the protons are Յ7. The enzyme-active sites must destabilize the enzyme-substrate complex and/or stabilize the enzyme-intermediate complex so that the free energy of the transition state for ␣-proton abstraction can be lowered sufficiently to be consistent with the observed k cat values.
The members of the enolase superfamily can be assigned to subgroups based upon primary sequence alignments: the MR subgroup, the MLE subgroup, and the enolase subgroup; coincidentally, high resolution x-ray structures are available for these three enzymes (MR, Protein Data Bank entry 1mnr (6); MLE I 2 ; enolase, Protein Data Bank entry 1ebh (7)). Each is composed of an N-terminal domain (␤ 3 ␣ 4 ) of ϳ140 residues that precedes a ␤/␣ barrel domain (TIM barrel) of ϳ200 residues. 3 In all three enzymes, the analogous domains have globally similar structures, supporting the hypothesis that these enzymes are related by divergent evolution.
The three subgroups are distinguished by 1) the identities and sequence contexts of the three carboxylate ligands for the essential divalent metal ion; 2) the number and identities of the general base catalysts that initiate the reactions by abstraction of the ␣-proton; and 3) the identities of the electrophilic catalysts that stabilize the enolate anion intermediate by electrostatic and/or hydrogen-bonding interactions.
In the MR, MLE, and enolase structures, the observed differences in the carboxylate metal ion ligands may be correlated with differing coordination geometries of their various substrates to the divalent metal ion (5). The observed differences in the identities and numbers of the general basic catalysts, one * * This minireview will be reprinted in the 1997 Minireview Compendium, which will be available in December, 1997. This is the second article of two in the "Minireview Series on Enzyme Superfamilies." This work was supported by National Institutes of Health Grants GM-40570 and GM-52594 (to J. A. G.). Fig. 1 was produced using the MidasPlus program (26) from the Computer Graphics Laboratory, University of California, San Francisco, supported by National Institutes of Health Grant P41-RR01081. 1 The abbreviations used are: MR, mandelate racemase; MLE, muconate-lactonizing enzyme; NAL, N-acetylneuraminate lyase; DHDPS, dihydrodipicolinate synthase; VOC, vicinal oxygen chelate; TIM, triosephosphate isomerase.
in the active sites of the enolase subgroup, MLEs, and galactonate dehydratase and two in the active sites of the remaining characterized members of the superfamily, allow stereospecific, stereoselective, or stereorandom formation of the stabilized intermediate followed by partitioning of the intermediate to products via differing mechanisms.
The differences in the overall reactions can be rationalized explicitly in the context of the ␤/␣ barrel structure that contains the residues directly involved in catalysis. The active sites are located, as expected, at the C-terminal ends of the barrel domains in these proteins. Each functional group involved in catalysis is presented on a separate structural unit within the barrel domain, either at the C-terminal end of a ␤-sheet strand or in the loop that connects a ␤-sheet with the following ␣-helix ( Fig. 1). So organized, the structures of ␤/␣ barrels provide an important evolutionary advantage: the ability to deliver functional groups from each of the eight possible octants surrounding the bound substrates and intermediates. The functional groups, along with the associated secondary structural elements, can evolve independently, thereby allowing new catalytic activities to be generated while retaining the common partial reaction.
Recognition and description of this structure-function paradigm provide a rational basis for understanding the chemistry of all of the superfamily members. Comparative analysis of these proteins has led to important new observations that would have been difficult to obtain from the focused study of each enzyme alone, e.g. the prediction of unknown function for an open reading frame (4) and recognition of additional functions for a previously characterized enzyme (8).

The N-Acetylneuraminate Lyase Superfamily: Schiff Base-dependent Aldolases, Dehydratases, Decarboxylases
The N-acetylneuraminate lyase (NAL) superfamily presents a different mechanistic strategy in which the structural scaffold evolved to utilize a protonated Schiff base as an "electron sink" (Fig. 2). At least one of the steps in the mechanisms of each of the substantially different overall reactions represented by the NAL superfamily takes advantage of this electron sink: NAL (aldol condensation, Fig. 2A High resolution structures have been described for NAL (9) and DHDPS (10). Each of the enzymes in the NAL superfamily is composed of a single ␤/␣ barrel domain. A lysine residue conserved in all members of the family (Lys-165 in NAL) is located within the sixth strand of ␤-sheet; the ⑀-amino group is positioned in the "floor" of the active site depression at the C-terminal end of the barrel. This residue forms a Schiff base with the ␣-keto group of the substrate.
Although the TIM barrel-type scaffold associated with the NAL superfamily differs substantially from that of the enolase superfamily, examination of the sequences of NAL superfamily in the context of the three-dimensional structures of NAL and DHDPS reveals that many highly conserved residues in the superfamily are located at the C-terminal ends of other ␤sheets in the ␤/␣ barrel, the same pattern observed in the enolase superfamily. These residues are likely responsible for the different Schiff base-dependent reactions catalyzed by the members of the superfamily. The identities and importance of these residues have not been investigated, but their proximity to the active site suggests that the general strategy used by nature to evolve the members of the enolase superfamily may also have been used in the evolution of the NAL superfamily.
Also, in parallel with the enolase superfamily, identification of the common mechanistic strategy associated with the NAL superfamily architecture offers important insights into the reactions catalyzed by as-of-yet uncharacterized members of the superfamily. One superfamily member, the mosA gene product (Rhizobium meliloti), functions in rhizopine biosynthesis, a biosynthetic pathway that has not been fully characterized (11). Whereas studies of a mosA Ϫ mutant suggest that the MosA protein catalyzes methylation of a hydroxyl group (12), this predicted function is unexpected since it cannot be readily associated with a mechanism involving a protonated Schiff base functioning as an electron sink, e.g. the use of ␣-ketobutyrate as a methyl group donor is unprecedented in enzymology.

The Crotonase Superfamily: Coenzyme A Hydratases/Isomerases/Dehalogenases/Hydrolases
The members of the crotonase superfamily (13,14) catalyze several reactions of coenzyme A thioesters that require stabilization of an oxyanion intermediate. An enolate anion is generated by abstraction of the ␣-proton of the thioester (Fig. 3A) in reactions resulting in ␤-addition/elimination of water (crotonase and carnitine racemase), 1,3-proton transfer (3,2-transenoyl-CoA isomerase), and carbon-carbon bond formation (naphthoate synthase, intramolecular addition of the enolate anion to a carboxyl group). An enolate anion is also generated by nucleophilic aromatic addition in the reaction catalyzed by 4-chlorobenzoyl-CoA dehalogenase (Fig. 3B).
In addition, an enzyme that hydrolyzes a thioester, ␤-hydroxyisobutyryl-CoA hydrolase, was recently discovered to be a member of the crotonase superfamily (15). Whereas this reaction involves formation of a tetrahedral alkoxide anion (Fig.  3C) rather than the enolate anion common to the other members of the superfamily (Fig. 3, A and B), kinetic competence in each reaction type is dependent on stabilization of their respective intermediates. Conversion of a thioester to either type of intermediate involves a large change in the proton affinity of the carbonyl oxygen (pK a of the conjugate acid Յ 0) as it is converted to either an enolate anion (pK a of the conjugate acid, ϳ11) or the oxygen of an alkoxide anion (pK a of the conjugate acid, ϳ13). Thus, a common strategy can be used to stabilize both types of intermediates; a significant increase in hydrogen bond strength between the thioester carbonyl oxygen (hydrogen bond acceptor) and active site proton donors as the enzymesubstrate complex is converted to the enzyme-intermediate complex via a transition state is likely to be similar in structure and energy to the intermediate (16 -18).
High resolution structures have been described for 4-chlorobenzoyl-CoA dehalogenase (19) and crotonase (20). These proteins are not ␤/␣ barrels; instead, they represent a "spiral fold" composed of two distinct domains, with the active site located entirely in the N-terminal ␣ ϩ ␤ domain. In each enzyme, the thioester carbonyl group is located in an "oxyanion hole" in which it is hydrogen-bonded to two peptide N-H groups. These interactions provide the structural basis for significant stabilization of the enolate anion or tetrahedral alkoxide intermediates as charge is localized on the oxygen.
One interesting difference between crotonase and the dehalogenase is that the critical catalytic steps occur at different distances from the thioester functional group in the structures of their respective substrates; hence, the functional groups mediating the ␤-elimination and nucleophilic aromatic substitution reactions are located on different secondary structure elements in the enzyme-active sites. The syn ␤-elimination reaction catalyzed by crotonase is apparently general acid-general base catalyzed by a pair of glutamate residues, Glu-144 and -164, with Glu-164 appropriately positioned to initiate the ␤-elimination by abstraction of the ␣-proton. Using the same basic architecture, the nucleophilic aromatic substitution reaction catalyzed by the dehalogenase uses a spatially different carboxylate group, that of Asp-145, as a nucleophile to form an arylated enzyme intermediate; this, in turn, is hydrolyzed using His-90 as a general basic catalyst to yield the final product (21).
The reactions catalyzed by carnitine racemase and 3,2-transenoyl-CoA isomerase are also initiated by abstraction of the ␣-proton of the thioester substrate; the sequence alignment predicts that the active sites of these enzymes contain homologs of Glu-164 in crotonase. Interestingly, the reaction attributed to naphthoate synthase is also initiated by abstraction of the ␣-proton of a thioester of the aliphatic carboxylate group, but the sequence alignment fails to identify a homolog of Glu-164 in crotonase, suggesting another evolutionary refinement of the structural solution. Also, the sequence alignment suggests that the active site of the thioester hydrolase contains a homolog of Glu-164, but not Glu-144, in crotonase and of Asp-145, but not His-90, in the dehalogenase (alignment not shown). The mechanisms of the reactions catalyzed by these enzymes are unexplored, so a detailed molecular understanding of evolution of catalytic function in this superfamily cannot yet be specified.
That the substrate for each of the enzymes in the crotonase superfamily is a coenzyme A thioester could imply that substrate binding and not chemistry is the critical factor in the evolution of this superfamily. However, the diversity in the overall reactions catalyzed and the accompanying variation in the identities of the general acidic and basic catalysts that mediate those chemistries reflect a common structural solution to the problem of stabilization of reactive oxyanionic intermediates using an "oxyanion hole." Thus, although all of the superfamily members bind a thioester substrate, a mechanistic analysis of the reactions supports our conclusion that the difficult chemical step of stabilization of a reactive intermediate has dominated the evolution of this family.
The only structure currently available for this superfamily is that of 2,3-dihydroxybiphenyl 1,2-dioxygenase (24). Each polypeptide of the tetrameric protein is an eight-stranded ␤-sheet structure composed of two domains approximately equal in size. Although the sequence identity is less than 20% for the two domains, their three-dimensional structures are nearly superimposable. However, only the C-terminal domain contains ligands for the Fe 2ϩ (two His and one Glu). Both glyoxalase I and the fosfomycin resistance protein are approximately one-half the size of the dioxygenase, and sequence alignments suggest that these proteins possess two histidines and one glutamate that function as the ligands for the divalent metal ions.
The common mechanistic strategy mediated by the similar structures of the VOC superfamily is the ability to exploit metal ions to facilitate their respective catalytic functions. Interestingly, each of the three currently identified members of the superfamily utilizes a different metal ion for catalysis. In the dioxygenase, the Fe 2ϩ likely coordinates the vicinal hydroxyl groups of the catechol substrate, binds the dioxygen cosubstrate, and participates in redox chemistry that facilitates the ring cleavage that accompanies oxidation. In glyoxalase I, the Zn 2ϩ likely provides electrostatic stabilization of the enediolate intermediate involved in the 1,2-proton transfer reaction by direct coordination. In the fosfomycin resistance protein, the Mn 2ϩ likely binds to the oxirane oxygen, thereby acting as a Lewis acid catalyst that facilitates the nucleophilic attack of glutathione on an oxirane carbon. Whereas the identities and catalytic functions of the divalent metal ions are distinct, the domains that contain the metal ion ligands are homologous. Thus, in the VOC superfamily nature evidently selected a protein framework that binds a divalent metal that can be used to stabilize complexes with vicinal anionic oxygen atoms, thereby offering yet another general strategy for the evolution of different reaction mechanisms.

Summary
We have described four superfamilies of enzymes where the members within each catalyze different overall reactions using broadly varied substrates. Within each superfamily, the different overall reactions are facilitated by a common mechanistic strategy that can be rationalized in the context of the structural scaffold. Although space limitations prevent their discussion, a number of additional superfamilies have been described in which these principles appear to obtain. Analyses of the relationships between structure and function in all of these superfamilies suggest two general conclusions regarding the evolution of new catalytic activities.
1) Nature discovered that chemistry, and not binding specificity, is the dominant factor in the evolution of new enzymatic activities. New enzymatic activities evolve by duplication of the gene for a preexisting enzyme that provides a structural strategy for a mechanistically difficult chemical step. As a result, related enzymes can differ broadly in the identity of the overall reactions they mediate as well as in substrate specificity. 4 2) The catalytic activity of a newly sequenced but uncharacterized open reading frame cannot necessarily be inferred from the overall reactions catalyzed by homologous enzymes. Rather, the chemical step common to the superfamily scaffold must be identified and correlated with conserved structural features.
These conclusions should be useful in developing new strategies for solving problems of current interest in mechanistic enzymology and structural biology as well as in the emerging disciplines of bioinformatics. For example, determination of the principles that govern the structure-function correlations for any particular superfamily will likely play an important role in assigning catalytic function to unknown sequences. Finally, by providing a more contextual basis for understanding both the rapid rates and mechanisms of enzyme-catalyzed reactions, superfamily analysis has the potential to offer insights that cannot be obtained even from the most elegant studies of a single enzyme.
Note Added in Proof-The structure of human glyoxalase I has been reported (27); the domain structure as well as the identities and positions of the metal ion ligands are homologous to those reported for 2,3-dihydroxybiphenyl 1,2-dioxygenase (24).