Crystal Structure and Mechanistic Implications of N2-(2-Carboxyethyl)arginine Synthase, the First Enzyme in the Clavulanic Acid Biosynthesis Pathway*

The initial step in the biosynthesis of the clinically important β-lactamase inhibitor clavulanic acid involves condensation of two primary metabolites, d-glyceraldehyde 3-phosphate and l-arginine, to give N2-(2-carboxyethyl)arginine, a β-amino acid. This unusual N-C bond forming reaction is catalyzed by the thiamin diphosphate (ThP2)-dependent enzyme N2-(2-carboxyethyl)arginine synthase. Here we report the crystal structure of N2-(2-carboxyethyl)arginine synthase, complexed with ThP2 and Mg2+, to 2.35-Å resolution. The structure was solved in two space groups, P212121 and P21212. In both, the enzyme is observed in a tetrameric form, composed of a dimer of two more tightly associated dimers, consistent with both mass spectrometric and gel filtration chromatography studies. Both ThP2 and Mg2+ cofactors are present at the active site, with ThP2 in a “V” conformation as in related enzymes. A sulfate anion is observed in the active site of the enzyme in a location proposed as a binding site for the phosphate group of the d-glyceraldehyde 3-phosphate substrate. The mechanistic implications of the active site arrangement are discussed, including the potential role of the aminopyrimidine ring of the ThP2. The structure will form a basis for future mechanistic and structural studies, as well as engineering aimed at production of alternative β-amino acids.

ble synthesis of clavulanic acid is rendered difficult by the high lability and density of functionalization around the bicyclic core (3,6). Consequently clavulanic acid is produced by fermentation and there is considerable interest in its biosynthetic pathway.
ThP 2 -dependent enzymes generally employ the cofactor both to effect nucleophilic attack at a carbonyl group and to stabilize the carbanion subsequently formed by proton transfer or decarboxylation. The reactions catalyzed by ThP 2 enzymes may be subclassified as either non-oxidative, such as those of yeast pyruvate decarboxylase (PDC) (23) and benzoylformate decarboxylase (24), or oxidative, such as those of pyruvate oxidase (25,26). More recently, several ThP 2 enzymes that catalyze atypical reactions have been identified, including CEAS and 2-hydroxyphytanoylcoenzyme A lyase, an enzyme involved in the degradation of phytanic acid that is derived from the side chain of chlorophyll (27).
In addition to its key role in clavam biosynthesis, CEAS is of mechanistic interest due to the unprecedented manner in which a ThP 2 -dependent enzyme is used in a N-C bond forming reaction, proposed to proceed via an ␣,␤-unsaturated acyl-ThP 2 intermediate. Here we report the crystal structure of CEAS complexed with ThP 2 and Mg 2ϩ , and discuss the mechanistic implications arising.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The ceas gene was PCR amplified and subcloned directly as an NdeI/BamHI fragment into the pET24a(ϩ) vector (Novagen). Primers used were: forward, 5Ј-acgct-catatgtcccgtgta-3Ј; reverse, 5Ј-ggacatgagcggtctcccgactccgtcagtacc-3Ј. Following amplification, the integrity of ceas was confirmed by DNA sequencing. The ceas/pET24a(ϩ) construct was transformed into Escherichia coli BL21(DE3) and grown at 37°C in 2TY medium containing 30 g/ml kanamycin. Protein production was induced at an A 600 of 0.8 by addition of isopropyl-␤-D-thiogalactopyranoside to a final concentration of 1.0 mM, following reduction of the temperature to 30°C. CEAS was purified by sequential use of Q-Sepharose and phenyl-Resource chromatography. Size exclusion chromatography (Superdex S200) was used as a final step to yield CEAS of Ͼ95% purity by SDS-PAGE analysis. Electrospray ionization mass spectrometry (ESI-MS) revealed that the mass of the isolated CEAS was consistent with loss of Nterminal methionine from the predicted amino acid sequence (observed, 60,776 Da; calculated without N-terminal methionine, 60,776 Da). Nterminal amino acid sequence analysis (observed, SRVSTAPSGK) confirmed the identity of the protein and the loss of the N-terminal methionine. The activity of purified CEAS was confirmed by high performance liquid chromatography assay and comparison with known synthetic standards.
Selenomethionine-substituted Protein Expression-Selenomethionine-substituted CEAS was produced using a metabolic inhibition protocol and LeMaster media supplemented with 50 mg/liter L-selenomethionine. Selenomethionine incorporation was Ͼ95% by ESI-MS.
Native Molecular Weight Determination-The "native" molecular weight of CEAS was determined by size exclusion chromatography (Superdex 200 HR). Calibration was carried out using cytochrome c (12 kDa), chymotrypsin (25 kDa), ovalbumin (43 kDa), bovine serum albumin (67 kDa), aldolase (158 kDa), and blue dextran (2000 kDa) (Gel Filtration Calibration kit, Amersham Biosciences). An elution volume parameter (K av ) was calculated for each of the calibration proteins and a calibration curve was constructed. By calculating K av for CEAS the native molecular weight was established.
Crystallization-Crystallization conditions were initially sought by the vapor diffusion method, using commercially available crystallization screens. Hanging drops of 2 l of protein (10 mg/ml), containing a 3-fold excess of thiamin diphosphate, and 2 l of well solution were suspended at 17°C over a well solution containing 1.6 M ammonium sulfate and 0.1 M HEPES, pH 7.4. Under these conditions, crystals of two different morphologies grew over a period of 2 weeks. The crystals belonged to two different space groups, however, the shape of the crystals was not a determinant of the space group. The majority of the crystals belonged to space group P2 1 2 1 2 1 , with unit cell dimensions a ϭ 117.7 Å, b ϭ 127.3 Å, c ϭ 196.8 Å. The remainder of the crystals belonged to space group P2 1 2 1 2, with unit cell dimensions a ϭ 123.1 Å, b ϭ 187.2 Å, c ϭ 198.6 Å.
Crystallographic Data Collection and Structure Solution-Crystals were cryocooled by plunging into liquid nitrogen and x-ray data were collected at 100 K using a nitrogen stream. Cryoprotection was accomplished by sequential transfer into solutions of 1.8 M ammonium sulfate, containing 10% (v/v) glycerol and 25% glycerol. Data from selenomethionine-substituted crystals was collected on beamline BM14 at the European Synchrotron Radiation Facility, Grenoble, France, using a MarCCD detector. All other data were collected on beamlines 14.1 and 14.2 of the Synchrotron Radiation Source, Daresbury, UK. The data were processed using MOSFLM (28) and the CCP4 suite of programs (29).
The selenomethionine-substituted crystal used for the structure solution belonged to the P2 1 2 1 2 1 space group. Using SHELXD (30), 40 selenium positions were determined, corresponding to four molecules per asymmetric unit, excluding the N-terminal methionine that was not present by N-terminal sequencing and ESI-MS. Phasing and density modification were performed using CNS (31). The phases had a figure of merit of 0.38 over the resolution range to 2.35 Å, which rose to 0.94 following density modification. Data collection, phasing, and refinement statistics are shown in Table I.
Refinement-An initial model was built using the program O (32). Refinement of this model against the selenomethionine-substituted data (remote wavelength) using a strict non-crystallographic symmetry relationship was performed using CNS. One cycle of simulated annealing followed by grouped B-factor refinement brought the R free to 30.5%. 5% of the reflections were excluded for calculation of R free . Further rounds of refinement using REFMAC5 (33), including individual Bfactor refinement and the addition of solvent molecules and ThP 2 cofactor, brought the conventional R-factor to 15.7% and the R free to 19.8%. Non-crystallographic symmetry restraints were used throughout refinement. The current model contains residues 12-572 for all four molecules in the asymmetric unit, excluding residues 182-184 for which the electron density was unclear. In total the final model contains The 2.35-Å native dataset was solved by rigid body refinement using REFMAC5, further rounds of refinement brought the R-factor to 15.7% and the R free to 19.5%. The P2 1 2 1 2 crystal form was solved by molecular replacement with the 2.35-Å selenomethionine structure using CNS. There were six molecules per asymmetric unit. Refinement of this data was performed using REFMAC5.

RESULTS AND DISCUSSION
Crystallization-Diffraction quality CEAS crystals were obtained in the presence of ThP 2 , using ammonium sulfate as precipitant, whereas poor quality crystals that did not diffract x-rays were obtained in the absence of ThP 2 under a variety of conditions. The useful crystals grew in two morphologies, rods and many-sided polyhedra. The majority of the crystals were of the polyhedral form. The crystals belonged to two different space groups, P2 1 2 1 2 1 and P2 1 2 1 2, with differing cell dimensions. Both space groups were observed for the polyhedralshaped crystals although as yet only the P2 1 2 1 2 1 space group has been observed for the rod-shaped crystals.
Structure Solution-The structure of CEAS has been determined to 2.35-Å resolution in the P2 1 2 1 2 1 crystal form using the multiple anomalous dispersion method and the P2 1 2 1 2 form was subsequently solved to 2.45-Å resolution by molecular replacement. In both crystal forms a similar tetrameric arrangement of CEAS molecules was observed (Fig. 2). In the P2 1 2 1 2 1 form the tetramer was comprised of the four molecules in the asymmetric unit, whereas the P2 1 2 1 2 form contained one tetramer in the asymmetric unit and one dimer that formed a tetramer by association about a crystallographic 2-fold rotation axis.
Monomer Structure-Each CEAS subunit consists of three domains, labeled ␣, ␤, and ␥ as in the structure of PDC (Fig. 3) (38). The overall structure and fold of CEAS is similar to that of PDC (Fig. 4) and is reflected in a root mean square deviation of 1.95 Å over 372 C␣ atoms (66% of CEAS residues). The ␣ domain (residues  is the most similar to its equivalent in PDC (root mean square deviation 1.50 Å over 150 C␣ atoms, 83% of CEAS residues in this domain), whereas the ␤ domain (194 -376) is the least similar (root mean square deviation 1.63 Å over 109 C␣ atoms, 59.9% of CEAS residues). The most substantial difference between the ␣ domains is the complete absence in CEAS of the second ␣ helix of PDC. The lower similarity of the ␤ domains is mainly because of differences at each end of the central ␤-sheet (Fig. 4). At one end CEAS lacks the inserted antiparallel ␤-strand of PDC, having a loop over connection between strands ␤11 and ␤12 that maintains the parallel nature of the ␤-sheet, whereas at the other end CEAS both lacks one ␤-strand and possesses an additional ␣-helix (␣9). The ␤ domain was found to be the most flexible domain of PDC (38), which may be connected to the tetrameric rearrangement that occurs with that enzyme (see below). The ␥ domain of CEAS (377-572), which contains the majority of the residues that bind the ThP 2 , has a substantially similar backbone arrangement around the active site to that of PDC, with the differences being largely because of differing lengths of secondary structural elements.
CEAS has only 16% sequence identity to PDC. However, the relative orientation of the three domains of each is very similar, which is presumably because of the necessity of maintaining such an arrangement to form the tightly bound dimer, essential for composition of the active site.
Oligomerization-The observation of a tetramer in the crys- tal form is in agreement with gel filtration chromatographic analysis, which showed CEAS to exist in two oligomeric solution states. The calculated approximate native molecular masses of 240 and 130 kDa correspond to tetrameric and dimeric forms of CEAS (60,776 Da after N-terminal methionine loss). Relative peak areas suggest that, under the conditions of analysis, these forms exist in approximately equal quantities (Fig. 5).
The tetramer consists of two tightly bound dimers (root mean square deviation 0.19 Å over all C␣ atoms), which are more loosely bound together to form a tetramer. The tetramer has a size of ϳ115 Å on the longest dimension and 60 Å on the shortest. Monomer-monomer contacts occur exclusively between ␣ and ␥ domains, whereas dimer-dimer contacts to form the tetramer mainly feature the ␤ domain. Internally, each tightly bound dimer buries a surface area of ϳ7400 Å 2 , whereas each monomer buries a surface area of 1950 Å 2 with the closest monomer of the other dimer and 460 Å 2 with the further monomer. Altogether, the dimer-dimer interaction buries a surface area of ϳ4600 Å 2 . Each dimer is stabilized by a large number of hydrogen bonds. By contrast, there are very few hydrogen bonds at the dimer-dimer interface, although there are a considerable number of crystallographic water molecules located in the sizeable cavity between the dimers, at the center of the tetramer. Whereas these water molecules may provide substantial stabilization, because of the greatly differing surface areas of interaction the tetramer is best described as a dimer of dimers.
It is not yet known for CEAS whether large conformational changes occur in the tetramer on substrate binding, as in yeast PDC, where an allosteric substrate activation process takes place involving an approximate 30°movement of the dimers with respect to each other (39). This rotation is asymmetrical with respect to the tetramer and results in a separation of one pair of ␤ domains, whereas the other pair remain associated much as in the starting tetramer. The assembly of the CEAS tetramer is most similar to that of PDC from Zymomonas mobilis (ZmPDC) (40), which does not behave in such a manner and in which the tetramer is symmetrically arranged. By comparison to yeast PDC the dimers of the ZmPDC tetramer are rotated ϳ30°with respect to each other, but about the central point of the tetramer. In contrast to yeast PDC, this tetrameric arrangement is much more closely associated, hindering movement, and for ZmPDC has been ascribed as the activated conformation. Based on this comparison it seems likely that CEAS would not undergo tetrameric rearrangement although further investigation will be needed to confirm this supposition.
ThP 2 Binding Site-The ThP 2 binding region is situated across two subunits of the closely associated dimer, as with related enzymes such as PDC. The ␥ domain of one subunit and the ␣ domain of the other subunit contribute to binding, with two ThP 2 binding regions per dimer. ThP 2 binding is facilitated by a combination of hydrogen bonding and formation of a complex with a single Mg 2ϩ ion. The Mg 2ϩ ion is coordinated octahedrally by the side chains of Asp-463 and Asn-490, the backbone carbonyl of Thr-492, two diphosphate oxygens of ThP 2 and a water molecule, also bound to the carbonyl oxygen of Val-488 (Fig. 6). This environment is similar to the characteristic ThP 2 /Mg 2ϩ binding motif identified as a conserved sequence in all ThP 2 enzyme structures (41). This sequence is found entirely in the ␥ domain and contributes to the binding of the diphosphate region of the ThP 2 . Further hydrogen bonding of the diphosphate oxygens occurs directly to the backbone nitrogens of Phe-413, Leu-495, and Gly-465, all from the ␥ domain. The aminopyrimidine "end" of the ThP 2 is associated with non-conserved regions of both the same ␥ domain, and the ␣ domain of the other monomer in the dimer. It is anchored via hydrogen bonds between N-4Ј and the carbonyl oxygen of Ser-43, and between N-1Ј and the side chain of Glu-57 of the ␣ domain of the other monomer.
The ThP 2 cofactor is observed in a "V" conformation ( Fig. 6), bent around the hydrophobic side chain of Phe-438. Structural information on other ThP 2 -dependent enzymes indicates that a single hydrophobic residue is always found in a similar position, possibly aiding distortion of ThP 2 into this high potential energy state not normally witnessed outside a protein environment (23). The type of residue carrying out this function is specific to an individual enzyme, and evidence from mutation studies on yeast PDC suggests that it may have an important effect on mechanism (42).
Potential Glyceraldehyde-3-P and Arginine Binding Sites-In all the refined structures, a sulfate anion could be identified in the electron density at a distance of ϳ6 Å from the nucleophilic C-2 carbon of ThP 2 . It is likely that this sulfate occupies the binding position of the phosphate of glyceraldehyde-3-P. The sulfate is bound at one oxygen by the side chains of Arg-414 and His-415, at another by Tyr-271, and at a third by His-120 from the other monomer in the dimer (Fig. 7). The fourth sulfate oxygen is not bound directly to the protein and is located at a distance of 4.8 Å from the nucleophilic C-2 of ThP 2 . This distance is ideal for a fully extended four-bond link between these two atoms as might be expected from the reaction intermediate formed by ThP 2 and glyceraldehyde-3-P (Fig. 8).
A similar phosphate binding site has been identified in the crystal structure of transketolase complexed with erythrose 4-phosphate (43). The phosphate is anchored in a pocket of two arginine side chains, a histidine and a serine. The phosphate is bound further from the C-2 of ThP 2 than in CEAS, at a distance of ϳ9 Å, which fits with the larger size of the substrate over the glyceraldehyde-3-P of CEAS. The structure of yeast acetohydroxyacid synthase also reveals a phosphate bound close to the C-2 of ThP 2 (44). However, the phosphate ligands differ and the authors propose this as a binding site for the carboxyl group of the pyruvate substrate.
The active site cavity surrounding the proposed glyceraldehyde-3-P binding position is largely hydrophobic, containing the side chains of Ile-410 and Leu-495. The exceptions in this region are Gln-121 and Ser-436. It is also possible that, in the presence of substrate, the side chain of Arg-36 would move into a position capable of participating in the reaction. Overall, however, the CEAS active site is remarkable through its apparent lack of polar residues positioned to assist catalysis. By comparison, some of the active site residues with acidic/basic side chains proposed in a functional role in PDC are replaced in CEAS with neutral amino acids, e.g. His-115 by Gln-121, and Glu-477 by Ile-496. It is therefore difficult to assign distinct mechanistic functions to particular residues in the CEAS active site and it appears that CEAS must rely heavily on ThP 2 for general acid/base catalysis.
Analysis of the external region of the active site channel reveals a number of residues that may facilitate arginine binding. Furthest from the ThP 2 , Tyr-298 and Asp-113 could potentially bind the positively charged guanidino group. Nearer the proposed phosphate binding site, Arg-303 and Asp-301 may participate in arginine binding.
Mechanistic Discussion-In their assignment of the function of CEAS, Khaleeli et al. (12) proposed an outline mechanism for the condensation of D-glyceraldehyde-3-P with L-arginine, involving initial nucleophilic attack of the ThP 2 ylide on the aldehyde of D-glyceraldehyde-3-P. Subsequent elimination and tautomerization then allows loss of the hydroxyl function at the C-2 position. Prior labeling studies imply that addition of arginine to glyceraldehyde-3-P occurs with retention of configuration at C-2 (45,46), suggesting an elimination-addition process during formation of the C-N bond. In the final step, the acyl-ThP 2 intermediate must be hydrolyzed to release the N 2 -(2carboxyethyl)arginine product.
The absence of residues capable of general acid/base catalysis at the active site of CEAS suggests that the 4Ј amino group of the aminopyrimidine ring of ThP 2 plays a key role. The structure shows that the suggested binding/polarizing site for the phosphate of D-glyceraldehyde-3-P could position the alde-hyde of this substrate close to the nucleophilic carbon of ThP 2 .
Considerable work on the early stages of ThP 2 reaction mechanisms has shown that formation of the "activated" ThP 2 , by deprotonation at C-2, occurs via an intramolecular proton transfer to the 4Ј amino nitrogen of the aminopyrimidine ring (38,47). The stabilized negative charge at C-2 can then effect a nucleophilic attack on a substrate carbonyl group.
Following such a nucleophilic attack by ThP 2 at C-1 of glyceraldehyde-3-P (Fig. 8), proton transfer from C to O, via the amino group of ThP 2 , may occur to give a C-1 anion. Studies on deamino ThP 2 analogues have indicated that the 4Ј amino group of ThP 2 then stabilizes the negatively charged oxygen formed at the ␣-carbon position (48 -50), possibly via partial protonation. The proposed role for the 4Ј-amino group is supported by freeze-quench crystallographic studies on ThP 2 -substrate adducts with transketolase (51). Although the geometry FIG. 6. The ThP 2 binding site. ThP 2 is shown as a ball-and-stick representation in yellow, surrounded by its ligands in the active site of CEAS. An mF o Ϫ DF c (33) difference electron density map, calculated after random model perturbation and refinement with ThP 2 omitted to remove model bias, is shown contoured around the ThP 2 at 3.0 in red. A 2mF o Ϫ DF c electron density map is shown contoured around the CEAS residues at 1.0 in blue. of the adduct does not seem to satisfactorily allow for such stabilization, during transition state formation, the change in hybridization from sp 2 to sp 3 allows for a more favorable interaction between the oxygen and 4Ј amino group.
After formation of the enamine/carbanion, two distinct reaction pathways appear to occur in the mechanisms of most ThP 2 -dependent enzymes: either the enamine/carbanion is protonated, as in PDC, or it is not, such as in transketolase, where it reacts with another substrate molecule. This fundamental difference in mechanism is reliant on the presence of a proton donor close to the enamine/carbanion position. In PDC, the Glu-477 residue has been shown to fulfill this function, via a bridging water molecule (52). In benzoylformate decarboxylase, His-281 has been shown to perform the same role, possibly with the aid of Ser-26 (53). There appears to be no such proton donor present in CEAS where an isoleucine occupies an equivalent position to Glu-477 of PDC. It therefore seems reasonable that with CEAS the carbanion subsequently reacts to eliminate the C-2 hydroxyl group to give an enol/enolate.
The three positively charged residues binding the glyceraldehyde-3-P phosphate are bound to no other negatively charged atom, and the consequent polarization of glyceraldehyde-3-P would promote phosphate elimination. This elimination would give an ␣,␤-unsaturated acyl-ThP 2 species, which could act as a Michael acceptor for L-arginine. Possible binding sites for arginine could place the nucleophilic amino group in a suitable position for such an addition. Finally, hydrolysis of the acyl-ThP 2 complex would leave the ␤-amino acid product, N 2 -(2-carboxyethyl)arginine. Although details of the steps, including proton transfers are uncertain, it seems likely that the aminopyrimidine ring of ThP 2 plays a key role in some of the steps, possibly including elimination of both water and phosphate, and addition of arginine.