Structural and Enzymatic Analysis of MshA from Corynebacterium glutamicum

The glycosyltransferase termed MshA catalyzes the transfer of N-acetylglucosamine from UDP-N-acetylglucosamine to 1-l-myo-inositol-1-phosphate in the first committed step of mycothiol biosynthesis. The structure of MshA from Corynebacterium glutamicum was determined both in the absence of substrates and in a complex with UDP and 1-l-myo-inositol-1-phosphate. MshA belongs to the GT-B structural family whose members have a two-domain structure with both domains exhibiting a Rossman-type fold. Binding of the donor sugar to the C-terminal domain produces a 97° rotational reorientation of the N-terminal domain relative to the C-terminal domain, clamping down on UDP and generating the binding site for 1-l-myo-inositol-1-phosphate. The structure highlights the residues important in binding of UDP-N-acetylglucosamine and 1-l-myo-inositol-1-phosphate. Molecular models of the ternary complex suggest a mechanism in which the β-phosphate of the substrate, UDP-N-acetylglucosamine, promotes the nucleophilic attack of the 3-hydroxyl group of 1-l-myo-inositol-1-phosphate while at the same time promoting the cleavage of the sugar nucleotide bond.

Members of the Actinomycetales family function in diverse and important roles to humans. On the positive side, Streptomyces species produce over two-thirds of the clinically useful antibiotics of natural origin (1) and Corynebacteria species are heavily utilized in industrial synthesis (1,2), whereas on the negative side many Actinomycetales are pathogens, for example, Mycobacterium are the causative agents of the diseases tuberculosis (3) and leprosy, infecting millions worldwide each year (3,4). An intimate knowledge of biochemistry common to all Actinomycetales allows their utilization as chemistry toolboxes while mitigating their pathogenic nature.
For most organisms, maintenance of the appropriate reducing environment in the cell is required for proper cellular function, and this is usually achieved through the synthesis and cellular balance of low molecular weight thiols such as glutathione (5). Although glutathione is the predominate thiol in Gramnegative bacteria and eukaryotes, in the Actinomycetales, the major thiol is instead mycothiol (MSH), 2 1-D-myoinosityl-2-(n-acetyl-L-cysteinyl)amido-2-deoxy-␣-D-glucopyranoside (6). The synthetic pathway for MSH is believed to be conserved in all Actinomycetales species and requires at least four enzymes (MshA through MshD, Fig. 1). The first enzyme, MshA, catalyzes the transfer of N-acetylglucosamine from UDP-N-acetylglucosamine (UDP-GlcNAc) to 1-L-myo-inositol 1-phosphate (1-L-Ins-1-P) to produce 3-phospho-1-D-myo-inosityl-2-acetamido-2-deoxy-␣-D-glucopyranoside (GlcNAc-Ins-P) (7,8). A yet to be discovered phosphatase is proposed to dephosphorylate GlcNAc-Ins-P to produce GlcNAc-Ins (7). GlcNAc-Ins is then deacetylated (MshB) and subsequently cysteinylated (MshC) at the amino group to produce Cys-GlcN-Ins, which then is acetylated (MshD) to produce AcCys-GlcN-Ins or MSH (9 -13). In Mycobacterium smegmatis, mutants blocked in MSH biosynthesis exhibit enhanced sensitivity to cellular stress reagents, including hydrogen peroxide and rifampicin (14,15), whereas in Mycobacterium tuberculosis (Erdman strain) mutants blocked in MSH production were not viable (16). Of the four genes, mshA and mshC were found to be critical to the production of MSH and therefore viability of the organism (16,17). Interruption of mshB or mshD was either complemented by a promiscuous cellular activity or the product of interrupted synthesis acted as a poor analog of MSH (14,18). Therefore MshA and MshC are important potential drug targets for treatment of tuberculosis and other Actinomycetales infections. The three-dimensional structures of both MshB (19) and MshD (20,21) have been determined; however, there have been no reports of structures for MshA or MshC.
Glycosyltransferases (GTs), such as MshA, can be grouped based on sequence into what is currently a total of 90 distinct families (CAZy, Carbohydrate-Active Enzymes data base) (22,23). GTs can also be grouped based on structure, with most GTs determined to date having one of two folds, termed GT-A and GT-B (24). Recently two new GT folds have been described for CAZY family GT51 (25) and GT66 (26). Finally GTs can be termed "retaining" or "inverting" based on the configuration of the sugar nucleotide-derived carbohydrate in the final product. There is no absolute correlation between the fold (GT-A or GT-B) and the stereochemical outcome of the reaction as examples of both inverting and retaining GTs have been found in both families; however, all members of a CAZy family are expected to have the same fold (23,27). Based on sequence similarity MshA is grouped into CAZy family GT4, all of which have been found to be retaining glycosyltransferases and have the GT-B fold. GT4 is the second largest CAZy data base family (6563 CAZY entries at the time of this writing) and have a diverse set of donor and acceptor molecules. Only recently a limited number of GT4 family members have been structural characterized, including WaaG, which transfers glucose from UDP-glucose onto L-glycero-D-mannoheptose II; AviGT4, which is involved in avilamycin A biosynthesis; and PimA, a phosphatidylinositol mannosyltransferase (28,29).
Here we present the structure of unliganded MshA from Cornybacterium glutamicum and its complex with UDP and 1-L-Ins-1-P. Enzymatic studies demonstrate that UDP-GlcNAc and 1-L-Ins-1-P are substrates for CgMshA, and that the product is GlcNAc-Ins-P, as was previously determined for MshA from crude lysates of Mycobacterium smegmatis (7). Initial velocity studies indicate a sequential mechanism with UDP-GlcNAc almost certainly binding first followed by 1-L-Ins-1-P. The structural and kinetic data are most consistent with a mechanism involving the ␤-phosphoryl group of the substrate, UDP-GlcNAc, acting as the general base and not a mechanism involving a covalent enzyme-N-acetylglucosamine intermediate. The crystal structure highlights residues involved in binding and catalysis and serves as a starting point for inhibitor design.
Expression-Overexpression vectors containing the CgMshA gene were transformed into Rosetta2 cells (Invitrogen) and grown in 8-ml overnight cultures of MDG media (30). These cultures were then made 10% in glycerol and stored at Ϫ80°C in 2-ml aliquots. A 100-ml starter culture of MDG media was inoculated with one 2-ml storage stock and grown at 37°C to late-log phase. This was used to inoculate 4 liters of ZYP5052 autoinduction media in six 2-liter baffled flasks (30). The cultures were grown at 23°C with 300 rpm shaking and were harvested at saturation (36 -48 h, A 600 Ͼ 12). All growth media contained 30 g/ml chloramphenicol and 200 g/ml kanamycin.
Purification-Frozen cell paste was resuspended in three times its volume of buffer A (100 mM TEA, pH 7.8, 200 mM (NH 4 ) 2 SO 4 , 10% glycerol, 15 mM imidazole), and lysozyme was added to 1 mg/ml. After incubation at 4°C for 30 min the cell slurry was disrupted by sonication and was spun at 25,000 ϫ g for 30 min. The supernatant was applied to a nickel-Sepharose HP (GE-Healthcare) column equilibrated against buffer A. Bound proteins were eluted with a gradient of 15-300 mM imidazole over 10 column volumes. Fractions with greater than 20% of the peak activity were pooled, made 1 M in (NH 4 ) 2 SO 4 , and applied to a Phenyl-Sepharose HP (GE-Healthcare) column equilibrated against buffer B (20 mM TEA, pH 7.8, 1 M (NH 4 ) 2 SO 4 , 10% glycerol, 0.5 mM EDTA, 1 mM ␤-mercaptoethanol). Bound proteins were separated with a 1 to 0 M (NH 4 ) 2 SO 4 gradient over 10 column volumes with properly folded CgM-shA eluting between 500 and 300 mM (NH 4 ) 2 SO 4 . Pooled fractions were concentrated by ultrafiltration (Amicon) to a concentration of 20 -40 mg/ml and stored at Ϫ80°C.
Crystallization and Phasing-Crystallization of CgMshA was by vapor diffusion under silicon oil (Fisher) utilizing 96-well round bottom assay plates stored open to room humidity at 18°C. Crystals of APO CgMshA with a hexagonal bipyramidal morphology were obtained from drops containing 2 l of protein (CgMshA:pET28a(ϩ), 15 mg/ml, 400 mM (NH 4 ) 2 SO 4 , 10% glycerol, 0.5 mM EDTA, 1 mM ␤-mercaptoethanol) with 2 l of precipitant (20% polyethylene glycol 4000, 100 mM Tris, pH 8.5, 200 mM LiSO 4 ). Crystals were soaked in a stabilization solution of 40% polyethylene glycol 4000, 100 mM Tris, pH 8.5, 200 mM LiSO 4 , prior to vitrification by plunging in liquid nitrogen. Data were collected on an MSC R-AXIS-IVϩϩ image plate detector using CuK ␣ radiation from a Rigaku RU-H3R x-ray generator and processed using MOSFLM and SCALA (31,32). The space group was determined to be P3 1 with approximate cell dimensions of a ϭ b ϭ 79.7 Å, and c ϭ 148.4 Å. There is a molecular dimer per asymmetric unit with a solvent content of 59%. The structure of APO CgMshA was determined by SIRAS using a single mercury derivitized crystal, which was prepared by soaking a crystal for 2 h in the stabilization solution with the addition of saturated p-chloromercuribenzoate. Heavy atom positions and initial phases were determined using PHENIX (33). The solvent flattened SIRAS map was of sufficient quality for ARP/WARP (34) to autobuild a majority of the structure. Iterative rounds of modeling building within the molecular graphics program COOT (35) followed by refinement against the data using REFMAC (36) were used to build the remainder of the structure. The final rounds included TLS refinement.
A second crystal form of CgMshA was obtained in drops that contained 2 l of protein (CgMshA:pET29a(ϩ), 40 mg/ml, 400 mM (NH 4 ) 2 SO 4 , 10% glycerol, 0.5 mM EDTA, 1 mM ␤-mercaptoethanol, 10 mM UDP-GlcNAc) combined with 2 l of precipitant (0.8 M sodium citrate, pH 5.5). This crystal form was cryoprotected by soaking in 1.2 M sodium citrate, pH 5.5, 10% glycerol, and 200 mM (NH 4 ) 2 SO 4 prior to vitrification by plunging in liquid nitrogen. For the UDP⅐1-L-Ins-1-P dataset the (NH 4 ) 2 SO 4 was replaced with 50 mM 1-L-Ins-1-P, and the crystals were soaked for 1 h. Data were collected at Brookhaven National Laboratories at beamline X25 and processed using MOSFLM (32) and SCALA (31). The space group was determined to be I422 with approximate cell dimensions of a ϭ b ϭ 223.9 Å, and c ϭ 125.0 Å. This crystal form has two molecules per asymmetric unit with a solvent content of 71.5%.
Phasing of the UDP dataset utilized the molecular replacement program AMORE (37) and the APO form of CgMshA as the model. Searches with the complete APO structure and or either of the N-or C-terminal domains did not produce a convincing molecular replacement solution. However, utilizing a dimerized N-terminal domain produced two solutions that were significantly above other solutions despite (or possibly because of) the fact that both of the molecular dimers in this crystal form are acted upon by crystallographic symmetry. Molecular replacement searches for the missing C-terminal domain after placement of the N-terminal domains were unsuccessful. The recently determined structure of PimA (PDB ID: 2GEJ) was used as a guide for the structure of a closed GT-B:GT-4 glycosyltransferase family member (29). Utilizing electron density maps phased with only the two placed N-terminal domains, and the PimA structure as a guide, there was enough secondary structure visible in a skeletonized map to manually place both C-terminal domains, which were then successfully refined into their correct positions with rigid body refinement. Molecular constraints for UDP and 1-L-Ins-1-P were calculated using the PRODRG server (38). Refinement of this crystal form utilized REFMAC with the final rounds incorporating TLS refinement (36).
DLS measurements were performed with a 4 mg/ml CgM-shA solution in 250 mM (NH 4 ) 2 SO 4 , 50 mM TEA, pH 7.8, at 25°C on a DynaPro MS/X instrument (Protein Solutions). Data collection and deconvolution were performed using the DYMAMICS 6.2.05 software (Protein Solutions).
Production of 1-L-Ins-1-P-The gene for inositol-1-phosphate synthase from the hyperthermophile Archaeoglobus fulgidus (Af_INO) was cloned as described by a previous group (39). Briefly, the Af_INO gene was PCR-amplified from genomic DNA and ligated into pET23a(ϩ) vector utilizing Nde1 and HindIII restriction sites. The resultant plasmid was transformed into Escherichia coli strain BL21DE3/pLysS, and the cells were grown and protein was expressed by autoinduction in an identical fashion as CgMshA (see above), except that the antibiotic was ampicillin (100 g/ml), and the cells were grown at 37°C for 12-18 h. Cells were sonicated in three times the weight/volume of buffer A (50 mM Tris, pH 8.5), and spun at 25,000 ϫ g to remove cell debris. The supernatant, contained in a steel vessel, was heated in an 80°C water bath, with constant stirring for 5 min. The vessel was then placed on ice and cooled to 4°C as quickly as possible by constant stirring. The precipitated proteins were removed by centrifugation (25,000 ϫ g). The supernatant was applied to a 2.6 ϫ 12 cm Poros-HQ anion exchange (POROS MEDIA) column, equilibrated with buffer A. Bound proteins were eluted with a 0 to 0.25 M (NH 4 ) 2 SO 4 gradient over ten column volumes. Fractions, which related to peak concentrations of Af_INO, by SDS gels, were pooled, made 1 M in (NH 4 ) 2 SO 4 and applied to a 2.6 ϫ 12 cm phenyl-Sepharose (GE-Healthcare) hydrophobic exchange column equilibrated with buffer A plus 1 M (NH 4 ) 2 SO 4 . Bound proteins were eluted with a 1.0 -0 M (NH 4 ) 2 SO 4 gradient in buffer A over ten column volumes, pooled based on purity by SDS gels, and dialyzed overnight in buffer A. Purified Af_INO was concentrated to 10 mg/ml by Amicon ultrafiltration, snap frozen in liquid nitrogen, and stored at Ϫ80°C.
1-L-Ins-1-P was synthesized using Af_INO and glucose 6-phosphate. The starting solution was 125 mM glucose 6-phosphate, 0.625 mM ZnCl 2 , 1.25 mM NAD ϩ , and 3.6 mg of Af_INO in 50 mM Tris, pH 7.5 (NH 4 OH). The reaction was held at 85°C with a heat block, and every 45 min the NAD was increased by 0.5 mM and another 2 mg of Af_INO was added. The progress of the reaction was monitored utilizing the CgMshA/pyruvate kinase/lactate dehydrogenase assay with substoichiometric 1-L-Ins-1-P (potential from reaction, 0.20 mM) to NADH (0.25 mM). The completed reaction (ϳ2-3 h) was diluted to 100 mM (taking into account additions during the reaction) and stored at Ϫ80°C to be used directly in assays for MshA activity.
Measurement of Enzymatic Activity-The production of UDP was measured using a coupled assay system and monitoring the reaction at 340 nm. Standard conditions were 30 nM enzyme, 50 mM TEA, pH 7.8, 200 M NADH, 500 M phosphoenol pyruvate, 10 mM MgCl 2 , 20 units of pyruvate kinase (ammonium sulfate suspension), and 55 units of lactate dehydrogenase (ammonium sulfate suspension) in a 1-ml reaction at 25°C. Except for the enzyme, all components were mixed in the cuvette and allowed to equilibrate for 2 min. Reactions were initiated by the addition of enzyme. The amount of coupling enzymes was sufficient to not limit the rate of reaction and was required to minimize a lag in the assays.
Data Analysis-To determine the basic kinetic parameters for each substrate, initial velocity plots at saturating concentrations of one substrate were fit to the Michaelis-Menten equation (Equation 1) where V max is the maximal velocity, A is the concentration of the varied substrate, and K a is the Michaelis constant for substrate A. When both substrates were varied, intersecting initial velocity plots were fit to Equation 2.
Data described with single variable equations were analyzed using KaleidaGraph (Synergy Software). Data described using equations with multiple independent variables were analyzed using GraFit (Erithacus Software).
Isolation of GlcNAc-inositol 3-Phosphate-CgMshA (80 g) was added to a 1-ml solution containing 100 mM TEA, pH 7.8, 100 mM 1-L-Ins-1-P, and 100 mM UDP-GlcNAc. The production of UDP was monitored using a high-performance liquid chromatography-based assay and a UV detector (260 nm). The separation of compounds was accomplished using a 1-ml Mono Q ion-exchange column (GE-Healthcare) with the following programmed gradient: 0 -5 min (0% B), 20 min (35% B), 25-30 min (100% B), 33 min (0%B) where Buffer A is 20 mM ammonium bicarbonate and Buffer B is 600 mM ammonium bicarbonate and a flow rate of 1 ml/min. Retention times for UDP-GlcNAc and UDP were 12 and 20 min, respectively. When the reaction had run to completion (ϳ5 h), the entire reaction was injected and fractionated using the above high-performance liquid chromatography method. Fractions (1 ml) were treated with alkaline phosphatase and then assayed for inorganic phosphate using a malachite green phosphate assay kit (Bioassay Systems). Fractions 9 -12 were found to contain phosphate but did not correlate to a peak at 260 nm. These fractions were pooled and lyophilized. The lyophilized white powder was analyzed by 1 H NMR, 31

RESULTS AND DISCUSSION
Expression and Purification-Initial screens of full-length clones of MshA from Mycobacterium tuberculosis, Streptomyces coeliclor, Nocardia farcina, and C. glutamicum as N-terminally hexahistidine-tagged constructs all resulted in high overexpression, but only the MshA from C. glutamicum (CgMshA) resulted in large amounts of soluble protein. Typically, yields of soluble, active CgMshA using autoinduction media were 200 mg/liter. After nickel-nitrilotriacetic acid affinity chromatography, purification on Phenyl Sepharose hydrophobic exchange media was critical to separating active, monodisperse protein from inactive, polydisperse protein. Ammonium sulfate (Ͼ100 mM) was required to maintain activity, whereas protein kept in a minimal ionic strength buffer was subject to degradation and loss of activity. The initial construct of MshA utilized an N-terminal thrombin-cleavable 6ϫ His tag to facilitate purification. Attempts at thrombin cleavage resulted in nonspecific degradation, so the N-terminal tag was not removed. The N-terminally tagged protein was utilized for the initial structure determination of unliganded MshA. A second C-terminally tagged MshA was used to determine the structure of MshA in complex with UDP and 1-L-Ins-1-P.
Enzymatic Characterization-From the initial velocity plots for 1-L-Ins-1-P and UDP-GlcNAc, both substrates exhibited Michaelis-Menten kinetics, and no curvature was seen in the Lineweaver-Burk linear replots (supplemental Fig. S1). The kinetic parameters for each substrate were determined from fits of the data to Equation 1. A value of 12.5 Ϯ 0.2 s Ϫ1 was determined for the k cat , and the Michaelis constants for 1-L-Ins-1-P and UDP-GlcNAc were 240 Ϯ 10 and 210 Ϯ 20 M, respectively. The K m values are consistent with those determined previously for MshA from Mycobacterium smegmatis crude lysates (7). To determine whether MshA proceeds through a sequential or ping-pong mechanism, an initial velocity plot was made by varying both substrates. The plot shows intersecting lines, and the data fit well to Equation 2, consistent with a sequential mechanism (supplemental Fig. S2). The pseudo-disaccharide phosphate monoester product of the reaction was purified using ion-exchange chromatography. 31 P and 1 H NMR spectroscopy of the product was consistent with GlcNAc-Ins-P (supplemental Figs. S3 and S4). A coupling constant of 3.3 Hz was determined for the anomeric proton confirming that the product retains the alpha sugar configuration (inset, supplemental Fig. S4).
Structure homolog searches using SSM, and the closed UDP⅐1-L-Ins-1-P complex yielded a number of other members of the GT-B family. The closest structural homologs were other members of the GT-4 family; the ␣1,3-glucosyltransferse WaaG (SSM Z-score of 11.5, r.m.s.d. of 2.10 Å, and 21% sequence identity) and the phospatidylinositol mannosyltransferase PimA (Z-score of 9.4, r.m.s.d. of 2.14 Å, 26% sequence identity) (28,29). A slightly lower score was observed for the only other GT-4 family member with a known structure, an enzyme involved in avilamycin A biosynthesis, AviGT4 (Z-score of 7.4, r.m.s.d. of 2.85 Å, and 21% sequence identity) (28). Structure homolog searches using SSM and the open APO form of the enzyme yielded no structurally homologous structures with Ͼ70% structural overlay, indicating the open form of the enzyme is a newly observed conformation between the Nand C-terminal domains in GT-B family members.
MshA Oligomeric State-MshA is composed of 418 residues with a monomer molecular weight of 45,669. Analysis of CgM-shA by gel filtration chromatography and dynamic light scattering yielded apparent molecular weights of 97,700 and 111,000 respectively, suggesting CgMshA is a dimer in solution. There is a conserved dimer interface in the two crystal forms. In the P3 1 crystal form (APO form) there is a molecular dimer in the asymmetric unit, whereas in the I422 crystal form (UDP⅐1-L-Ins-1-P ternary complex) there are two monomers per asymmetric that, when acted upon by the crystallographic symmetry, yield molecular dimers with identical interfaces observed in the P3 1 crystal form. Superposition of the two N-terminal domains that form the dimer from the two crystal forms yields an r.m.s.d. of 0.53 Å (405 common C␣s). The CgMshA APOform dimer, when viewed perpendicular to the molecular 2-fold, has the shape of a slightly bent rod with dimensions of 40 ϫ 40 ϫ 135 Å (Fig. 2B). The dimer interface is entirely composed of residues from the N-terminal domain, wherein the dimer interface and the C-terminal domain interact on opposite sides of the N-terminal domain. The dimer interface is confined to ␣2, ␣3, ␣4, and ␤4/␣2 and their 2-fold symmetrically related counterparts (Fig. 2C). A total of 1488 Å 2 is buried per subunit within the dimer interface and includes both polar and non-polar interactions. A comparison of various Actinomycetes MshA orthologs suggests that the dimer interface is most likely conserved (supplemental Fig. S5). Some of the conserved interactions are a hydrophobic interaction between ␣2 and ␣3Ј (Leu-93 and Leu-119Ј), and also ␣4 with the ␤4/␣2 loop (Ile-156, Pro-72Ј, and Leu-76Ј), and a bifurcated polar interaction between the side chains of Gln-84 and Gln-160Ј. The active site is located between the N-and C-terminal domains and is Ͼ30 Å from the dimer interface at its closest points. However, three of the four residues that coordinate the phosphate of 1-L-Ins-1-P (Lys-78:␤4/␣2 loop, Tyr-110:␣3, and Arg-154:␣4) are located on the opposite face of structural elements used to construct the dimer interface (see below). Dimerization therefore might stabilize these structural elements and contribute secondarily to enzyme catalysis. Donor Substrate Binding-Crystallization of CgMshA in the presence of UDP or UDP-GlcNAc resulted in a tetragonal crystal form ( Table 1). The structure of the binary complex was determined by molecular replacement using the individual domains as search models (see "Experimental Procedures"). There are two subunits per asymmetric unit, with the CgMshA molecular dimer created by crystallographic symmetry. The two subunits are essentially equivalent with an r.m.s.d. of 0.31 Å over 392 residues. Similarly, the two subunits of the APO crystal form are essentially equivalent with an r.m.s.d. of 0.37 Å over 391 C␣ residues. Comparisons of the APO and ternary complex will therefore be confined to the subunits with the best crystallographic statistics. Data collected on cocrystals with UDP or UDP-GlcNAc resulted in similar maps showing only UDP bound to the active site (Fig. 3A, data not shown). Crystals grown with UDP-GlcNAc and subsequently soaked in high concentrations of UDP-GlcNAc (50 mM), neither degraded the diffraction pattern nor resulted in any observable UDP-Glc-NAc binding suggesting that the closed form of the protein, as held by crystal contacts, is not free to exchange sugar nucleotide. The lack of sugar nucleotides either arises from minor contaminating UDP in the UDP-GlcNAc preparation or slow hydrolysis of the UDP-GlcNAc bond over the lifetime of the crystals and the lack of free exchange of the bound glucoside with free UDP-GlcNAc.
Upon binding nucleotide there is a large conformational change bringing the binding sites for sugar nucleotide and 1-L-Ins-1-P into close proximity (Fig. 3, B and C, and supplemental   Electron density for UDP suggests that the pyrophosphate is in two conformations, most likely due to the lack of the anchoring interactions of GlcNAc (Fig. 3A). In addition, residual electron density adjacent to the ␣-phosphate may be the result of residual UDP-GlcNAc or an unknown small molecule binding at this position with low occupancy. The majority of the resi-dues that interact with UDP are located in the C-terminal domain, with the uracil moiety forming polar interactions with the main-chain atoms of Arg-294 (␣9/␤11) and the side chain of Cys-262, the ribose with Glu-324 (␣10), and the pyrophosphate with the main-chain amides of Leu-320, Val-321 (␣10), and Arg-231 along with the side chains of Arg-231 and Arg-236 (␤9/␣7) (Fig. 4, A and B, and supplemental Fig. S7). Upon sugar nucleotide binding and domain rotation, residues from the N-terminal domain contribute interactions.  are disordered in the APO state but become ordered in the binary complex. In the newly ordered state, Pro-16 and Gly-17 lay against the face of the uracil moiety, Asn-15 hydrogen bonds with the uracil (O2), and the backbone amide of Gly-23 is hydrogen bonded to the pyrophosphate moiety. A singular interdomain hydrogen bond is formed between the backbone carbonyl of Gly-17 and the backbone amide of Gly-264.
There are three relevant structures of related glycosyltransferases with bound sugar nucleotides. The structures of WaaG and OtsA were determined with UDP-2-deoxy-2-fluoroglucose (UDP-2FGlc) (28,44), and PimA was determined with GDPmannose (GDP-man) (29). The binding mode of UDP-2FGlc to CgMshA monomer in complex with UDP and 1-L-Ins-1-P (closed conformation). UDP and 1-L-Ins-1-P are shown as sticks colored by atom type. C, molecular dimer of CgMshA after domain reorientation. Electron density for 1-L-Ins-1-P was sufficient only to fit in one subunit but was modeled in the dimer shown here for illustrative purposes. D, stereo illustration of DYNDOM rotation axis, relating the N-and C-terminal domain before and after binding of nucleoside. The APO structure is illustrated by the green trace, whereas the N-and C-terminal domains of the ternary complex are colored blue and cyan, respectively. ␣12 is colored maroon in both structures to help in orientation, whereas the hinge residues (196 -197 and 386 -392) are colored red.
WaaG and OtsA were essentially identical, whereas the binding of GDP-man to PimA resulted in an alternate conformation for the pyrophosphate-sugar moiety, such that the faces of 2FGlc and mannose are perpendicular to each other, but binding in a similar "bent back" conformation relative to the pyrophosphate. Because UDP-2FGlc and UDP-GlcNAc share a common nucleoside base and sugar stereochemistry, the UDP-2FGlc structures were used to construct a model of UDP-GlcNAc binding to CgMshA (Fig. 4, A and B). As in all retaining glycosyltransferases the sugar moiety is bent back over the pyrophosphate moiety, exposing the anomeric sugar nucleotide carbon to nucleophilic attack. In this conformation the sugar 2-amino and 4-hydroxy groups would form hydrogen bonds with the ␤-phosphate and ␣-phosphate, respectively. In this binding mode, the GlcNAc molecule would have steric collisions with the ␤12/␣10 loop, especially with Ser-317, which projects into the active site and interacts with the side chain of Glu-316 in the UDP binary complex. In the APO structure this loop is in a slightly different conformation and Ser-317 is positioned away from the UDP binding site. Therefore, in the UDP-GlcNAc model the ␤12/␣10 loop was modeled based partially on its conformation in the APO state and utilizing main-chain conformations seen for this loop in OtsA and WaaG binary complexes with UDP-2FGlc (28,44). In the molecular model the C3,C4-diol moiety of the sugar moiety plugs into the ␤12/␣10 loop, forming hydrogen bonds between the 4-OH and the main-chain amides of Gly-319 and Phe-318, and between the 3-OH and the main-chain amides of Ser-317 and the side chain of Glu-316. In addition, the 6-OH forms hydrogen bonds with the side chains of His-133 and Asn-171 in a conserved interaction seen in the structure of OtsA with UDP-2FGlc (44). The acetyl group of UDP-GlcNAc would lie in a slot bordered by the ␤12/␣10 loop and the 1-L-Ins-1-P binding site (see below), with potential interactions with the side chains of Asn-315, Thr-134, Phe-235, and Arg-231.
Acceptor Substrate Binding-Crystals of the binary complex could be soaked in high concentrations (50 mM) of 1-L-Ins-1-P without damaging the diffraction pattern. Examination of electron density maps after most of the structure was fit, suggested that only one of the subunits contained significant electron density for 1-L-Ins-1-P, so only that subunit will be discussed (Fig. 3A). There were no significant structural changes between the binary and ternary complexes (data not shown). The B-factors for 1-L-Ins-1-P average 84 Å 3 , nearly twice the surrounding residues (ϳ45 Å 3 ), suggesting Ͻ100% occupancy, and perhaps a less than optimal binding site for 1-L-Ins-1-P. Despite this the structure of the ternary complex highlights several key features required for binding of the acceptor substrate. The side chains of Lys-78, Arg-154, Thr-134, and Tyr-110 all form hydrogen bonds with the phosphate moiety of 1-L-Ins-1-P with Tyr-110 in an allowed but disfavored phi-psi conformation (phi ϭ 76, psi ϭ 151) (Fig. 4, A and B, and supplemental Fig. S7). In the APO structure, and the UDP binary complex, this site is occupied by a sulfate ion, which may be one reason enzyme activity is stabilized by ammonium sulfate. The inositol moiety interacts with the ␤1/␣1 loop and the N-terminal end of ␣1. There are polar interactions to the 3-OH (Met-24 N ), 4-OH (Asn-25), and 5-OH (His-9 and Asp-20 O ), whereas the side chains of Met-24 and Arg-231 form walls adjacent to the faces of the inositol. In the UDP⅐1-L-Ins-1-P structure the 3-OH is within hydrogen bonding distance (2.95 Å) to one of the oxygens of the ␤-phosphate and, therefore, is well positioned to participate in glycosidic bond formation.
Based on the structure of the UDP⅐1-L-Ins-1-P complex, three major binding determinants for 1-L-Ins-1-P are proposed. First the large number of contact points, both general electrostatic and hydrogen bonding interactions with the phosphate moiety are proposed to provide a large positive binding energy and limit the possible orientations of the inositol. Second, Arg-231 is important in delineating the substitution and conformation of the sugar. Arg-231 is in two conformations in the APO state, and becomes ordered in the binary complex through its interactions with the ␤-phosphate of UDP-GlcNAc. The side chain of Arg-231 lies against the face of the inositol and is ϳ4.5 Å from the 1-L-Ins-1-P. Its position against the face requires that the 3, 4, 5, and 6 substituents of the ligand be in equatorial positions. Finally, the 2-OH of 1-L-Ins-1-P is the only substituent in an axial position and points into a small pocket created by the side chains of Met-24, Tyr-110, Thr-134, and UDP-Glc-NAc. The axial nature of the 2-OH allows a closer approach of the 3-OH to the ␤-phosphate-GlcNAc bond.
Mechanistic Analysis-Determination of the APO and UDP⅐1-L-Ins-1-P ternary complex in conjunction with the modeling of the CgMshA⅐UDP-GlcNAc structure permits analysis of potential enzyme mechanisms. First, the open conformation and closed conformations are most likely distinct low energy states and not an artifact of crystallization because in each crystal form the two copies of the protein per asymmetric unit are in similar conformations, despite having dissimilar crystal contacts. In the APO structure the binding sites for UDP and 1-L-Ins-1-P are ϳ30 Å from each other and, therefore, require a large domain reorientation to bring the reactants in close proximity. In addition, a portion of the ␤1/␣1 loop is disordered in the APO structure, and therefore several of the binding determinates between CgMshA and the inositol moiety are either absent or oriented incorrectly. Binding of UDP-GlcNAc to the C-terminal domain presents an alternative surface to the N-terminal domain resulting in the stabilization of a closed form of the enzyme through interactions of the ␤1/␣1 loop with the pyrophosphate and uracil moieties. Therefore, binding of UDP-GlcNAc and domain reorientation result in the structuring of the ␤1/␣1 loop and the completion of the binding site for 1-L-Ins-1-P. These structures suggest that CgMshA proceeds by an ordered mechanism with UDP-GlcNAc binding first and 1-L-Ins-1-P binding second.
Chemical mechanisms proposed for glycosyltransferases share many of the features proposed for glycosidases. Glycosidases can be described as glycosyltransferases where the acceptor is a water molecule and, like glycosyltransferases, can be defined as inverting or retaining depending on the stereochemical outcome at the anomeric carbon. The involvement of an oxocarbenium ion transition state in the chemical mechanisms of glycosyltransferases and glycosidases is well supported. Significant normal ␣-deuterium secondary isotope effects have been reported for inverting and retaining glycosyltransferases (45)(46)(47) and glycosidases (48,49) consistent with C1-O bond cleavage occurring prior to attack by the acceptor sugar or water. For inverting glycosidases and glycosyltransferases, the proposed single displacement mechanism is well established (50). There is also structural and kinetic evidence that retaining glycosidases follow a double displacement mechanism with participation of a covalent enzyme-substrate intermediate (49,51).
Not surprisingly, prior to any structural evidence, retaining glycosyltransferases were proposed to utilize a double displacement mechanism. However, as the first structures were solved, it became apparent that there was not a readily identifiable nucleophile in the active site to support the formation of a covalent intermediate (44,52,53). In the absence of direct evidence of a viable covalent intermediate, an alternative mechanism, termed as an S N i "internal return," has been proposed invoking an oxocarbenium ion-like transition state, with phospho-sugar bond breakage and glycosidic bond formation occurring in a concerted, but necessarily asynchronous manner, on the same face of the sugar (52,53). This mechanism was first proposed nearly 30 years ago to describe anomalous results seen in the solvolysis of glucopyransoyl fluorides where a kinetically unimolecular reaction (S N 1-like) resulted in retention of configuration of the sugar (S N 2-like) (54).
Two recent reports, however, provide indirect support for the double displacement mechanism. Using a mutant form of the retaining glycosyltransferase LgtC, Lairson and coworkers (55) were able to isolate a covalently bound adduct. However, the adduct was unexpectedly formed on the neighboring residue, some 9.0 Å away from the anomeric carbon. A second report from Monegal and Planas describes the chemical rescue of a mutant form of a retaining ␣3-glycosyltransferase by sodium azide (56). The product of the chemical rescue is the inverted version of the sugar azide, which would be consistent with the first step in a double displacement mechanism. Although these findings provide evidence that retaining glycosyltransferases can accommodate parts of a double displacement mechanism, their support is inconclusive.
Similar to previous reports of retaining glycosyltransferases, the structure of CgMshA does not support a double displacement mechanism. In the CgMshA UDP-GlcNAc model, residues His-133:Thr-134 form a cap over the anomeric carbon on the ␤-face (closest distance, 5.5 Å; non-polar interactions). There are no side chains in proximity to assist in a double displacement mechanism (Fig. 4B). It should be noted, however, that as observed in the conversion of the APO-to the nucleoside-bound structure there are large conformational changes (domain-domain reorientation, ␤12/␣10 loop), and it cannot be ruled out that somewhere along the reaction coordinate that a potential protein nucleophile would make a close approach to the ␤-face of UDP-GlcNAc.
The CgMshA UDP-GlcNAc model complex is consistent with an S N i mechanism where nucleophilic attack by the hydroxyl of 1-L-Ins-1-P and departure of UDP occur on the same face. In such a mechanism the transition state is highly dissociative with the donor having significant oxocarbenium ion character (Fig. 4C). In the UDP-GlcNAc model the 3-OH of 1-L-Ins-1-P approaches UDP-GlcNAc from the ␤-face and is 2.3 Å from the phosphorous oxygen and 2.5 Å away from the anomeric carbon. There are no protein groups near enough to the 3-OH to act as a general base consistent with the formation of a cyclic intermediate involving the 3-OH, the ␤-phosphate, and the anomeric carbon (Fig. 4, B and C). The conformation of UDP-GlcNAc with the carbohydrate located over the phosphates in a bent-back conformation appears to be a key factor in catalysis. This conformation may promote the significant oxocarbenium ion character in the donor through the formation of electrostatic interactions between the phosphates and the sugar and through the formation of hydrogen bonds between the 2-NH and 5-OH of UDP-Glc-NAc to the ␤and ␣-pyrophosphate oxygens, reducing the electron withdrawing effects of these polar substituents. In addition, in the bent-back conformation the ␤-side of the anomeric carbon can be presented to incoming substrates without the steric constraints that would have been imposed by the ␤-phosphoryl oxygens. The oxocarbenium ion-like nature of the transition state can assist in acid-base chemistry by raising the pK a of the ␤-phosphate such that it can act as an active site base abstracting a proton from the approaching 3-OH of 1-L-Ins-1-P. The dissociative nature of the transition state may be further stabilized by Arg-261, which forms an ion pair with the ␤-phosphate. A comparison of the active site architecture of CgMshA with other retaining glycosyltransferases suggests that the bent-back conformation of the donor molecule and the interaction of the ␤-phosphate with a positive charge (typically an arginine or metal ion) is a conserved feature of retaining glycosyltransferases (50,57).
Mycothiol is the major low molecular weight thiol in Actinomycetes such as the human pathogen Mycobacterium tuberculosis, and therefore the enzymes involved in its biosynthesis are important drug targets. The structure determination of CgM-shA, which catalyzes the first step in mycothiol biosynthesis, as APO and in complex with UDP and UDP⅐1-L-Ins-1-P highlights residues that are critical to the binding of substrates and catalysis and suggests an ordered binding mechanism. The structure of the CgMshA-UDP⅐1-L-Ins-1-P complex is noteworthy, because very few retaining glycosyltransferases have been visualized with bound acceptor, and it adds credence to a proposed S N i "internal return" mechanism instead of a covalent intermediate/double displacement mechanism. In addition these structures shed light on the structural flexibility of a large subset of glycosyltransferases. The large 97°interdomain motion upon binding sugar nucleotide depicted here for CgMshA highlights the flexibility of the interdomain "hinge" region of GT-B glycosyltransferases and widens the possible dynamic range other family members may undergo during substrate binding. Due to the lack of sequence conservation between or even within CAZy families, it is difficult to predict the degree of structural motions for individual family members, but the structural data so far indicates interdomain flexibility is an important feature of sugar nucleotide recognition of the GT-B family and needs to be taken into consideration in the design of inhibitors.