The structure of 3-methylaspartase from Clostridium tetanomorphum functions via the common enolase chemical step.

Methylaspartate ammonia-lyase (3-methylaspartase, MAL; EC ) catalyzes the reversible anti elimination of ammonia from L-threo-(2S,3S)-3-methylaspartic acid to give mesaconic acid. This reaction lies on the main catabolic pathway for glutamate in Clostridium tetanomorphum. MAL requires monovalent and divalent cation cofactors for full catalytic activity. The enzyme has attracted interest because of its potential use as a biocatalyst. The structure of C. tetanomorphum MAL has been solved to 1.9-A resolution by the single-wavelength anomalous diffraction method. A divalent metal ion complex of the protein has also been determined. MAL is a homodimer with each monomer consisting of two domains. One is an alpha/beta-barrel, and the other smaller domain is mainly beta-strands. The smaller domain partially occludes the C terminus of the barrel and forms a large cleft. The structure identifies MAL as belonging to the enolase superfamily of enzymes. The metal ion site is located in a large cleft between the domains. Potential active site residues have been identified based on a combination of their proximity to a metal ion site, molecular modeling, and sequence homology. In common with all members of the enolase superfamily, the carboxylic acid of the substrate is co-ordinated by the metal ions, and a proton adjacent to a carboxylic acid group of the substrate is abstracted by a base. In MAL, it appears that Lys(331) removes the alpha-proton of methylaspartic acid. This motif is the defining mechanistic characteristic of the enolase superfamily of which all have a common fold. The degree of structural conservation is remarkable given only four residues are absolutely conserved.

The facile synthesis of optically pure chiral compounds remains one of the holy grails of organic chemistry. One approach is to use stereospecific or highly stereoselective enzyme catalysts to control, for example, the stereochemical course of addition reactions (1,2) or the selection of specific enantiomers of the substrate in kinetic resolutions. However, the exquisite substrate selectivity of enzymes is also one of their principle limitations, their catalytic activity often being restricted to a few substrates. Rational redesign of protein catalysts has had several high profile successes but has so far failed to open up large areas of organic chemistry to biocatalysis. An alternative approach that shows considerable promise is to couple rational design with directed evolution methods to select enzyme activity (e.g. Altamirano et al. (3)). In such examples enzyme residues are chosen for random mutagenesis on the basis of function (e.g. substrate binding or catalysis), and the required enzyme activity is selected using a suitable screen. Of course, the ability to assign function to enzyme amino acid residues is greatly aided by a three-dimensional structural model.
One enzyme that shows considerable potential for use in organic synthesis is 3-methylaspartic acid ammonia-lyase (MAL). 1 MAL is a 45.5-kDa enzyme found on the main catabolic pathway for glutamate in Clostridium tetanomorphum (4) and a number of other anaerobic microorganisms. MAL catalyzes the reversible anti elimination of ammonia from L-threo-(2S,3S)-3-methylaspartic acid to give mesaconic acid (Fig. 1). MAL also catalyzes the stereo-and regioselective addition of ammonia to several derivatives of mesaconic acid to form a limited number of homochiral substituted aspartic acids (1,5). Access to these synthetically useful compounds by conventional synthesis is extremely difficult and not well developed. The accessible range of synthetic homochiral substituted aspartic acids could potentially be extended using engineered MAL. However, engineering of MAL is greatly hindered by the lack of structural information and the absence of homologues in the Protein Data Bank.
The mechanism of MAL has attracted some interest. It was postulated that MAL belonged to the enolase superfamily of enzymes on the basis of distant sequence homology (6) and the requirement of MAL for two metal ion cofactors (Mg 2ϩ and K ϩ ) (7). The enolase superfamily of enzymes catalyzes a wide variety of transformations, including racemization, ␤-elimination of water, and cycloisomerization, all of which are initiated by a common metal-assisted, general base-catalyzed abstraction of the ␣-proton of a carboxylic acid to generate a stabilized enolate anion intermediate (8). This enolate intermediate can be partitioned into different products by suitable modification of active site residues as demonstrated with mandelate racemase (9).
Alternatively to this mechanism and on the basis of chemical modification studies, it had been proposed that Ser 173 of MAL is dehydrated post-translationally to dehydroalanine (9,10). The unusual dehydroalanine prosthetic group, which is absent * This work was supported by a Wellcome Trust and Office of Science and Technology Joint Infrastructure Fund award to St. Andrews. Synchrotron access was provided through BM14 UK. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The in other members of the enolase superfamily, could function as a Michael acceptor in the deamination reaction (11,12). Strong support for a similar post-translational modification in the related enzyme histidine ammonia-lyase (histidase) has been provided by x-ray crystallography (13). In histidase, an internal Ala-Ser-Gly tripeptide (residues 142-144) undergoes intrachain cyclization accompanied by dehydration of Ser 143 to form a 4-methylidene-imidazol-5-one moiety. Interestingly, Ser 173 of 3-methylaspartase is also present in an Ala-Ser-Gly tripeptide.
A fuller understanding of the enzyme specificity and mechanism clearly requires a three-dimensional structure. Here we report the structure of MAL from C. tetanomorphum. The structure of the protein from a different organism has been determined and is to be published shortly (14). The structure establishes that MAL belongs to the enolase superfamily, having the same ␣/␤-barrel topology characteristic of the family, and suggests that Lys 331 is the general base that abstracts the C-3 proton of 3-methylaspartate. It is evident from the crystal structure of MAL that Ser 173 is unmodified. The active site of MAL is similar in architecture to enolase. Both use two metal ions and a catalytic base during catalysis. A sequence alignment based on structural similarity emphasizes the tolerance of the enolase-type reaction for sequence substitution. This strongly suggests that directed evolution experiments with this superfamily, where the first key common chemical step is preserved, should be successful.

MATERIALS AND METHODS
Protein Expression, Crystallization, and Structure Determination-MAL from C. tetanomorphum was purified and crystallized as described previously (15). Crystals were grown by sitting drop vapor diffusion against a well solution of 20 -25% (w/v) polyethylene glycol 6000, 100 mM sodium acetate, pH 7.0, 25 mM Tris-HCl, pH 7.0, as precipitating solution and using 16 -22% ethylene glycol as additive.
Se-methionine-labeled protein was expressed in the methionine auxotroph Escherichia coli B834 following the protocol of Doublie (16). Diffraction data were measured from a shock-cooled crystal that was cryoprotected by soaking in mother liquor containing 20% (w/v) glycerol. Se-Met crystals belonged to space group P2 1 2 1 2 1 with cell parameters of a ϭ 67.2 Å, b ϭ 109.9 Å, c ϭ 110.2 Å, ␣ ϭ ⑀ ϭ ϭ 90°. The wavelengths were chosen corresponding to the maximum modulus of fЈ (inflection point), maximum of fЉ (peak), and minimum modulus of fЈ (remote). Data were indexed with MOSFLM (17) and reduced with SCALA (18). We used the program SHAKE AND BAKE (19,20) to locate the selenium atoms from the peak wavelength (anomalous signal). Initial phasing attempts using CCP4 MLPHARE (21) with just the peak wavelength showed the solvent boundary. We then included all three wavelengths and located the noncrystallographic two fold. Further map improvement using solvent flattening and noncrystallographic averaging to the 1.9-Å resolution limit of the data collected at the remote wavelength produced a superb experimental map. The map was traced using WARP-ARP (22) and manual intervention. The structure was refined by a combination of O (23) and REFMAC5 (24). The Mg 2ϩ complex was obtained by growing the crystals in the presence of 100 mM MgCl 2. Full statistics of the refinement and data collection are given in Table I and Table II.
Sequence Alignments, Structure Superpositions, and Modeling Studies-The BLAST server (25) (accessed via the Internet) was used to identify similarity in prokaryotic 3-methylaspartases. The DALI server (26) was used to identify structurally similar proteins to MAL. Superposition of members of the enolase superfamily was performed and optimized using LSQMAN (Uppsala Software Factory). The structures of enolase-type enzymes were retrieved from the Protein Data Bank (27). The model of the substrate complex was closely based on the enolase substrate complex (28). Each of the two domains of MAL was superimposed separately onto the corresponding domain of enolase. The substrate was modeled using SYBYL (Tripos Inc., St. Louis, MO) and docked to the divalent metal ion in the same bidentate manner seen for the enolase substrate.

RESULTS AND DISCUSSION
Overall Structure-MAL is a dimeric enzyme. Both monomers are found in the asymmetric unit and are related by a 2-fold axis. They are effectively identical with a root mean square deviation between the 412 C␣ atoms of each monomer of 0.35 Å. The principal differences between the monomers occur in a loop at residue 28 and a rigid body movement of residues Ala 170 -Ala 310 . The differences in these regions are consistent with crystal contacts. The monomer can be further decomposed into two subunits: a smaller N-terminal domain and a C-terminal ␣/␤-barrel (Fig. 2). The domains are linked by a short helix (Met 150 -Tyr 158 ) flanked by two stands. The C-terminal end of the barrel is partially obscured by two loops from the N-terminal domain. The result is the formation of a large open cleft (ϳ20 Å long, 16 Å wide, and 19 Å deep) with the loops from the N-terminal domain on one side and the C-terminal end of the barrel on the other side (Fig. 2). The overall shape of the molecule resembles a squat cylinder with a long arm attached to the cylinder rim. The arm is bent so that it reaches over the middle of the cylinder.
N-terminal Domain-The smaller N-terminal domain, Met 1 -Glu 163 wraps around the outside of the main domain (Fig. 2). It starts with a three-stranded anti-parallel ␤-zigzag. At the end of the first strand, a compact structural unit (Asp 19 -Gln 45 ) sits high above the entrance to the C-terminal ␣/␤-barrel domain. Continuing our analogy of cylinder and arm, this region represents the hand. The last strand of the meander continues with a long loop (residues Val 72 -Leu 84 ) extending toward the entrance to the active site. This loop is highly conserved among the different MAL sequences and contains the conserved Arg 80 . The peptide chain comes back and forms a helix (Leu 85 -Gly 103 ), which is packed against the zigzag. The helix is broken in the middle by a very highly conserved Pro 99 . The bend induced by the Pro 99 allows the helix to fit closely to the first two strands of the zigzag.
␣/␤-Barrel Domain-The main C-terminal domain Ile 164 -Gly 407 is an 8-fold ␣/␤-barrel. The topology of the barrel is ␤␤␣ ␣ (␤␣) 6 . The inner ␤-barrel is not entirely parallel with the second strand being antiparallel to the other strands, and the direction of the first helix is reversed with respect to the other helices. There are additional secondary structural elements on the outside of the barrel that do not belong to the core barrel fold. Ser 173 , the residue postulated to be modified to dehydroalanine, is located on the first ␤-strand of the barrel. The side chain points toward the helix against which the ␤-strand stacks. The electron density is entirely consistent with a Ser side chain, and the main chain shows no unusual structural features, which might be expected if there was unsaturation present. Mass spectroscopy of the protein confirms that this residue in solution is Ser. The conserved Cys 361 is oxidized in all structures (Se-Met, native, and metal ion) despite the presence of antioxidants in all protein solutions. The F o Ϫ F c electron density maps show three clear peaks distributed around the sulfur atom. At the resolution of our study it is impossible to absolutely distinguish between single oxidation with static disorder or multiple oxidation with no disorder. However, a single oxygen at three positions, each with a third occupancy, refines with a substantially lower B-factor than the connecting sulfur (2.5 versus 13 Å 2 ). Three oxygen atoms, each with full occupancy, refine to approximately the same B-factor as the sulfur (17 versus 15 Å 2 ). This suggests complete oxidation to sulfenic acid. Mass spectroscopic data (not shown) shows a ϩ16 m/z peak for a tryptic digest of protein in solution immediately after purification. This suggests that this residue is unusually sensitive to oxidation. Two residues in the barrel, His 194 and Val 391 , adopt disallowed / backbone conformations. The electron density clearly indicates clearly in both cases that the residues are in these conformations. His 194 is part of the metal binding site and is discussed below. Val 391 is part of a loop at the dimer interface.
Dimer Structure-The two molecules in the asymmetric unit make extensive contacts, burying 4700 Å 2 of surface area, 62% of which is apolar (values from www.biochem.ucl.ac.uk/bsm/ PP/server/). This extensive contact area is consistent with solution scattering results (not shown), which indicate that the molecule exists as a dimer in solution. The dimer has dimensions of 76 ϫ 60 ϫ 60 Å. Particularly striking is that the clefts seen in each monomer connect to form a 60-Å-long channel in the dimer. There are a number of contacts between the monomers. These contacts occur between the outer surfaces of the last, first, and second helix of the barrel and the ␤-strands and loops at the N terminus of the N-terminal domain. Of particular note is the packing of the first strand of the N-terminal domain (residues 7-14) against the last helix (residues 390 -404) of the ␣/␤-barrel.
Metal Binding Site-The divalent metal ion binding site is located in the C-terminal mouth of the barrel. It is formed by three carboxylic acids, Asp 238 , Glu 273 , and Asp 307 . Each contributes a single oxygen atom as ligand (Fig. 2). Three water molecules complete the normal hexagonal coordination of  Mg 2ϩ . In the metal-free structure the water molecules fill the metal coordination site, and the side chains adopt slightly different conformations. In the dimer the divalent metal ion binding site from each monomer sits at almost opposite ends of the large channel. His 194 hydrogen bonds to Glu 238 , presumably to assist in the stabilization of the enolate. MAL Is a Member of the Enolase Superfamily-Structure comparisons identify MAL as a member of the enolase superfamily. This family carries out a diverse series of chemical reactions with one step in common, the abstraction of a proton ␣ to a carboxylic acid function. MAL is structurally most similar to muconate-lactonizing enzyme (29) and mandelate racemase (9) (Z score above 20) but less similar to enolase (28), glucarate dehydratase (30), and o-succinylbenzoate synthase (31) (Z score below 10). A sequence alignment based on structural superposition is shown in Fig. 3. Despite the structural similarity, only four residues are conserved, and the sequence identity of MAL to the others members of the family is below 20%. This strikingly illustrates the tolerance of this fold for amino acid substitution. Two of the conserved residues, Asp 238 and Glu 273 , are ligands to the divalent metal ion. The other two are Pro 275 and Gly 352 . These residues fulfill structural roles in the protein and are chemically inert. In the structural superposition of MAL with enolase, the divalent metal ion binding site is effectively identical. This supports and extends the point made by Babbit et al. (6) and developed by Hasson et al. (29) that the chemistry carried out by the enzyme depends only on the metal ion to which the carboxylic acid of the substrate binds. In addition to the recognition of the carboxylic acid, the metal ion stabilizes the enolate intermediate that develops after abstraction of the proton in effect lowering the pK a of the ␣-proton. This stabilization may be assisted by His 194 , which hydrogen bonds to Glu 238 , and His 236 , which hydrogen bonds to Glu 273 . His 194 occupies the same position as Lys 164 in mandelate racemase (9) and Lys 213 in glucarate dehydratase (30), residues which are proposed to perform the same function. The enzyme can then use an appropriately positioned base to remove the ␣-proton. The enzyme can change the position and nature of the other catalytic residues to achieve catalysis (9,29).
The Active Site Location in MAL-An alignment of various MAL sequences shows that conserved residues cluster around the metal binding site and the open cleft. Comparison with the other members of the superfamily, particularly the structure of enolase with its substrate (28), confirms that these residues constitute the active site. The muconate-lactonizing enzyme study (29) confirmed the predictions based on the enolase structure (28) that the principal catalytic base, which abstracts the proton ␣ to the carboxylic acid, is located close to the metal binding site in the same spatial orientation in all enolase-type enzymes. Hasson et al. (29) further showed by structural superposition that the nature of the principal catalytic base varies (Asp, which acts on nearby His in mandelate racemase (9), and Lys in the others). In MAL Lys 331 occupies the same spatial relationship relative to the metal ion site seen in the other enzymes, and therefore we assign it as the principal catalytic base. This assignment supports the inference drawn from the fold of the enzyme that MAL proceeds by an enolasetype mechanism.
Mechanistic Implications of This Study-We have been unable to obtain a convincing electron density for a substrate complex, and thus we have had to model in the substrate to the enzyme active site. This is relatively straightforward given the enolase substrate complex structure. It is clear that on substrate binding the protein cleft closes up and that in our structures we have an open conformation. The cleft is closed as the two domains move toward each other hinging on the loop connecting them (Asn 159 -Ala 166 ). A "closed" model can be constructed by superimposing each domain of MAL separately onto the closed enolase structure. The enolase substrate 2-phospho-D-glycerate and ␤-methylaspartate have the carboxylic acid and the proton ␣ to that function in common. There is a further similarity as enolase catalyzes the elimination of water and MAL catalyzes the elimination of ammonia (Fig. 4). In our model the carboxylic acid of the side chain of ␤-methylaspartic acid is bound in a bidentate manner to the Mg 2ϩ ion (Fig. 4). Lys 331 is positioned to abstract the proton ␣ to this carboxylate (the 3S proton). The ␤-methyl group of the MAL substrate (same location as the hydroxyl in the enolase substrate) is bound in an open pocket formed by residues Gln 172 , Phe 170 , Leu 384 , and Tyr 356 . This binding pocket is quite large, consistent with observation that the enzyme can tolerate re-  (29), mandelate racemase (MR) (9), o-succinylbenzoate synthase (OSBS) (31), and glucarate dehydratase (Glucarate_DH) (30). The sequence alignment is based on structural superposition of the Protein Data Bank coordinates for each enzyme. The secondary structural elements are shown for each enzyme. Strikingly only the four residues highlighted in red are conserved in all enzymes. A star (*) denotes those residues in MAL for which a role in catalysis has been inferred (metal ion binding, substrate recognition, or main base). His 194 from MAL is marked with a triangle (OE). Lys 164 , the second base from mandelate racemase, is marked with a filled circle (F).