Structure and Mechanism of GumK, a Membrane-associated Glucuronosyltransferase*

Xanthomonas campestris GumK (β-1,2-glucuronosyltransferase) is a 44-kDa membrane-associated protein that is involved in the biosynthesis of xanthan, an exopolysaccharide crucial for this bacterium's phytopathogenicity. Xanthan also has many important industrial applications. The GumK enzyme is the founding member of the glycosyltransferase family 70 of carbohydrate-active enzymes, which is composed of bacterial glycosyltransferases involved in exopolysaccharide synthesis. No x-ray structures have been reported for this family. To better understand the mechanism of action of the bacterial glycosyltransferases in this family, the x-ray crystal structure of apo-GumK was solved at 1.9Å resolution. The enzyme has two well defined Rossmann domains with a catalytic cleft between them, which is a typical feature of the glycosyltransferase B superfamily. Additionally, the crystal structure of GumK complexed with UDP was solved at 2.28Å resolution. We identified a number of catalytically important residues, including Asp157, which serves as the general base in the transfer reaction. Residues Met231, Met273, Glu272, Tyr292, Met306, Lys307, and Gln310 interact with UDP, and mutation of these residues affected protein activity both in vitro and in vivo. The biological and structural data reported here shed light on the molecular basis for donor and acceptor selectivity in this glycosyltransferase family. These results also provide a rationale to obtain new polysaccharides by varying residues in the conserved α/β/α structural motif of GumK.

* This work was supported by Agencia Nacional de Promoció n Científica y Tecnoló gica Grant PICT 1-11703 and Universidad de Buenos Aires (Argentina) Grant UBACyT X-193. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Table 1  Nacional de Investigaciones Cientificas y Té cnicas (CONICET), Argentina. 2 Silvina R. Salinas is the recipient of a doctoral fellowship from Agencia Nacional de Promoció n Cientifica y Tecnoló gica. 3 Patricia L. Abdian is a member of CONICET. 4 To whom correspondence should be addressed: Fundació n Instituto Leloir.
duced by the phytopathogen Xanthomonas campestris. Xanthan is involved in X. campestris virulence toward a substantial number of economically and agriculturally important plants (12). Also, this polysaccharide has a wide range of potential applications and functions (13). X. campestris GumK (␤-1,2 glucuronosyltransferase), a membrane-associated protein that is part of the biosynthetic machinery for xanthan, is responsible specifically for the addition of a GlcA residue from UDP-GlcA during the formation of the pentasaccharidic subunit of xanthan ( Fig. 1) (14,15).
Despite the synthetic utility and industrial/medical importance of GTs, many details of enzyme structures and mechanisms remain elusive. In particular, no structural information is available on xanthan-specific GTs, and not much is known about the GTs involved in the synthesis of other polysaccharides (16,17). This is due to the fact that GTs are difficult to characterize because the proteins are often membrane-associated, unstable, present at very low concentrations, and difficult to express.
Herein we describe the structure of GumK in the presence and absence of UDP. We focused on the molecular contacts that anchor the donor molecule to the protein, including kinetic analyses of mutant proteins and the in vivo effects of these mutations on X. campestris polysaccharide production. Also, we mutated residues that could be directly involved in the catalytic mechanism of GumK. Determination of the catalytic mechanism and of the specific contacts with substrates could result in strategies for the exploitation of GTs as unique syn-thetic catalysts in the creation of unnatural polysaccharide variants.

EXPERIMENTAL PROCEDURES
Protein Purification and Crystallization-GumK protein with a C-terminal LEHHHHHH tag was expressed from plasmid pETHisKC and purified as described previously (14). Purified protein was concentrated to 20 mg/ml in storage buffer (400 mM NaCl, 0.05% Triton X-100, 50 mM Tris-HCl, pH 8.0) by ultrafiltration. The protein concentrate was stored at 4°C until use. Crystals of the native and D157A mutated form of GumK were grown at 20°C using the hanging drop vapor diffusion method, as described previously (18). For the UDP-GumK or UDP-GumK D157A complexes, crystals of GumK were soaked for 0.5-8 h in crystallization solution plus UDP-GlcA (10 or 100 mM) at 20°C. Unfortunately, GlcA was readily hydrolyzed from UDP-GlcA during these soaking experiments in the native form of GumK, and we could only see the position of UDP bound to GumK. Furthermore, we were unable to co-crystallize GumK or mutant D157A in the presence of UDP-GlcA.
Data Collection and Phasing-Heavy atom soaks were carried out in crystallization buffer (35% polyethylene glycol 3350, 0.1 M Tris-HCl, 0.2 M Li 2 SO 4 , 0.1 M CsCl, pH 8.2) supplemented with 10 mM K 2 PtCl 4 for 2 h. Single crystals of native and platinum derivative GumK were drawn out of the crystallization drop and frozen in liquid nitrogen. All data sets were collected at 110 K. Crystallization buffer was used as the cryoprotectant. A two-wavelength MAD data set (peak ϭ 1.0718 Å and inflection ϭ 1.0722 Å) was collected to 2.0 Å resolution from a platinum derivative in beamline X12C, National Synchrotron Light Source, Brookhaven National Laboratories (Brookhaven, NY) in an ADSC Q210 modified detector. Reflection intensities were integrated using MOSFLM, merged with SCALA, and reduced with Truncate (19). Statistics are shown in Table 1. The crystal belonged to space group P6 5 22, in which an asymmetric unit comprised one GumK molecule. Platinum sites were found by using SHELX, and the positions, B-factors, and occupancies were refined by using Sharp (20), with the four platinum positions identified after six rounds of refinement and inspection of log likelihood gradient residual Fourier maps. Density modification was performed by using DM, and solvent flattening was performed by using Solomon (19). This processing resulted in a readily interpretable map of electronic density.
Single UDP-GlcA-soaked crystals (native and D157A mutated forms) were frozen in liquid nitrogen. Complete data sets of the GumK-UDP complex were collected in beamline DO3B-MX1, Laboratorio Nacional de Luz Sincrotron (Campinas, Brazil), at a wavelength of 1.427 Å. Statistics are shown in Table 1.
Model Building, Refinement, and Validation-Model building of the native GumK was performed with ARP/wARP. Refinement was carried out with Refmac. For nondefined regions, manual building was performed with Coot (21) alternated with Refmac. At the beginning of analysis, a fraction of the data sets (5%) was set aside for R free calculations. To determine the structure of the GumK-UDP complex, ARP/wARP using the native GumK as a model was used, iterated with Refmac refinement. For nondefined regions, manual building was performed with Coot. Surface electrostatic potentials were calculated using the Adaptive Poisson-Boltzmann Solver (APBS) program (22) and visualized with Pymol (DeLano Scientific LLC) (available on the World Wide Web).
Soluble compounds were removed by three washes with deionized water, and Man-Cel-P-P-lipid was recovered by three extractions with 1:1 chloroform/methanol. The apparent K m for the donor substrate, UDP-GlcA, was determined at a fixed concentration of the acceptor Man-Cel-P-P-lipid (500 M) because of the limited supply of this substrate. The K m for the acceptor was determined at UDP-[ 14 C]GlcA concentrations (specific activity 30 Ci/mol) estimated to be at least 10 times greater than the K m . Incubations were performed at 20°C for 2 min in 100-l volume reactions with 2% Triton X-100. After the incubation, the radioactive glycolipid product was recovered by organic solvent extraction as described (14). Assays were performed in triplicate.
Site-directed Mutagenesis-Mutations were introduced into the cloned gumK gene by using the QuikChange TM site-directed mutagenesis kit (Stratagene) with the appropriate primers. Mutations were confirmed by sequence analysis. The mutated gumK genes were expressed in Escherichia coli and purified as described previously for GumK (14). Purified mutated proteins were stored at 4°C until use. For in vivo complementation assays, the open reading frames of mutated GumK were cloned in the wide host range plasmid pBBRprom (14). Plasmids were denoted pBBRSK for the pBBRprom derivative expressing wild-type GumK and pBBRSK/mutation (e.g. pBBRSK/D157A) for pBBRprom derivatives expressing mutated GumK. Protein expression in complemented strains was verified with Western blotting using polyclonal antibodies raised against GumK.

-ray data collection and refinement statistics for wild type GumK
Values for the highest-resolution shell are shown in parentheses.

RESULTS AND DISCUSSION
Structures of GumK and of the GumK-UDP Complex-X. campestris GumK is the founding member of family GT70 (see the CAZy data base on the World Wide Web), which is composed of phytopathogenic bacterial glucuronosyltransferases involved in exopolysaccharide biosynthesis. The x-ray structure of GumK was solved with a two-wavelength MAD experiment (Table 1). There was only one protein molecule in the asymmetric unit. The final 1.9-Å structure included residues 13-385 and 480 water molecules (Protein Data Bank code 2HY7). The final native GumK structure had a crystallographic R value of 0.18 and an R free of 0.20. The GumK-UDP complex was solved at 2.28 Å resolution by MR, using the native structure as a model. The R free was 0.216 for the final structure (Protein Data Bank code 2Q6V). Fig. 2 shows that GumK is a twodomain molecule with an overall size of ϳ50 ϫ 50 ϫ 65 Å. The N-domain is formed by residues 13-201 and the final C-terminal ␣-helix, C␣7 (residues 362-380), a feature observed in other GT-B enzymes (25)(26)(27)(28). This domain is composed of 10 ␣-helices surrounding a core of eight mostly parallel ␤-sheets (Fig. 2, A and B). The C-domain is composed of residues 210 -361, which consists of a core of six ␤-sheets shielded by six ␣-helices.
The ␤-strands and ␣-helices of both domains are ordered as in a typical Rossmann fold (29) and exhibit high structural homology (r.m.s. deviation ϭ 2.02 over 88 C-␣), which con-  firms that despite the low sequence homology between these domains, the same fold is adopted. The N-and C-domains are joined by a linker (residues 202-208) between the eighth ␤-strand and the first ␣-helix of the C-domain. This interdomain linker, together with the loop connecting C␣6 and C␣7 (residues 353-361), defines the floor of the cleft between the two domains. The cleft is ϳ20 Å deep and 15 Å across at its widest point. The dimensions of the cleft suggest that the enzyme crystallized in an "inactive," open state. Recently, it was shown that a large relative rotation between the N-and C-domains is necessary for catalytic activity in GT-B MshA (30). This interdomain flexibility has also been observed or predicted for other members of the GT-B superfamily (26,(31)(32)(33). These motions of 10 -25°are believed to convert the enzyme to an "active," closed conformation, bringing critical residues from the N-and C-terminal domains together into a catalytically active conformation. We will study whether this type of movement is required for GumK activity. Fig. 3 shows the position of UDP in its binding pocket. This pocket is located on the C-terminal face of the cleft, in a positively charged surface. The UDP-binding pocket is an ␣/␤/␣ motif defined by C␣3, C␤4, and C␣4 and the linkers between them (Figs. 2B and 3D). This structural motif is highly conserved throughout the GT-B superfamily (6,34) as an alternative way to coordinate the negative charge of the phosphates in the nucleotide-sugar. Regardless of the low sequence homology, the degree of structural conservation with other GT-B superfamily members was evident upon calculation of the structural homology of the C-terminal globular domain. The GumK-UDP structure exhibited marked superposition with family 5 glycogen synthase from Agrobacterium tumefaciens (Protein Data Bank code 1rzu; r.m.s. deviation ϭ 3.8 Å) and with family 4 lipopolysaccharide core biosynthesis ␣-1,3-glucosyltransferase (Protein Data Bank code 2iv7; r.m.s. deviation ϭ 3.3 Å) (26,35).
The most notable contacts of GumK-UDP include hydrogen bonds between the imidic NH of Met 231 and ␣-phosphates O1 and O2 and between the imidic NH of Met 306 and the ␤-phosphate O1. The phosphates are also coordinated by hydrogen bonding between Lys 307 NH 2 and ␤-phosphates O2 and O3 and between Tyr 292 OH and O1␤ and O2␤ (Fig. 3C). Mutation of Lys 307 and Tyr 292 had marked effects on both the K m of UDP-GlcA and the V max (Table 2), indicating the importance of these contacts in the interaction with the negatively charged phos-phates of the donor molecule. The ribose is bound by hydrogen bonds between its 2Ј-and 3Ј-hydroxyls and Q310 NH 2 . The Q310A mutation substantially increased the K m of UDP-GlcA, despite having a minor effect on the V max . Finally, the uridine is bound by contacts between O4Ј and the imidic NH of E272 and between the carbonilic CO of Met 273 and N3Ј.
The hydrogen bonds and atoms involved in UDP binding are detailed in Table 3. All of these interactions seem to have a cooperative effect on the binding of UDP. Mutations Y292A and K307A have marked effects on the K m of UDP-GlcA and on the catalytic efficiency (k cat ), whereas other mutations, such as E272A and E272D, had smaller effects on the K m and k cat ( Table  2). Despite the influences of some contacts on the efficiency of the enzyme, individual contacts proved to be relevant but not essential for binding the substrate. Interestingly, Lys 307 , the contact that had the most pronounced effect on the k cat of GumK, is located in the conserved C␣4 helix and is one of the residues that coordinate the phosphates (Table 3 and Fig. 4). Fig. 3, C and D, shows the restrictions to which the donor substrate is exposed while entering the binding pocket. From the architecture of the ␣/␤/␣ UDP-binding motif, it is clear that any purine-based nucleotide would not be able to fit in the narrow pocket created by the C␣3 and C␣4 helices, specifically because of the hydrogen bonding between Glu 272 and Met 273 and the uridine. All residues that contact the UDP are conserved in the GT70 family (Fig. 4), which indicates that, as expected, binding of the donor substrate is conserved. For biotechnological applications, a relaxed or even a changed specificity could be very useful for the synthesis of novel polysaccharides. We speculate that a change in the specificity of GumK is possible by mutation of residues in the conserved C␣4 helix,     where some of the most important contacts for binding the donor molecule are located. Proposed Membrane Association Site-Subcellular localization experiments showed that GumK associates with the cell membrane in X. campestris (14). The electrostatic surface potential for GumK reveals a polar protein with a positively charged N-domain (theoretical pI 9.97) and a negatively charged C-domain (theoretical pI 6.20). A cluster of basic and hydrophobic residues (in helices N␣2 and N␣4 and the linker region between N␣4 and N␤4) lies at the tip of the N-terminal domain (Fig. 5A). The side chains of residues Arg 58 , Lys 60 , Arg 86 , Arg 95 , Arg 96 , Arg 100 , and Arg 108 are solvent-exposed, forming an arginine cluster surrounded by hydrophobic residues. This arrangement suggests possible involvement of the region in membrane interactions. A model for other GTs proposes a mixed hydrophobic-electrostatic interaction between an equivalent basic region in the N-terminal domain and the membrane (36 -38). In this model, there is a first contact between the negatively charged membrane and the positive charges of a cluster of basic residues. Afterward, the contact is strengthened by the interaction of the membrane lipids with the hydrophobic residues.
Indirect evidence for this model was observed for GumK. When E. coli BL21(DE3)/pETHisKC cells were cultured in LB medium in the presence of added NaCl (250 mM), a substantial fraction (ϳ50%) of the protein became soluble. The purified soluble GumK fraction retained activity during in vitro enzymatic assays. This result might indicate that a hypothetical first electrostatic interaction was interrupted, leaving soluble, properly folded GumK. Furthermore, the location of the basic patch is consistent with the proposed acceptor binding site. Membrane association in this region would bring the middle cleft closer to the membrane surface, where the soluble UDP-GlcA donor is coupled to the membraneanchored acceptor glycolipid, Man-Cel-P-P-lipid (Fig. 5B). The degree of this interaction and the relative importance of individual residues of the basic cluster, together with surrounding hydrophobic residues, is currently being investigated in our laboratory.
In Vivo Analysis of GumK Mutant Activities-The biosynthesis of bacterial polysaccharides is a complex process that involves several enzymes and transport proteins. In X. campestris, it is very difficult to measure intermediate glycolipids during the synthesis of xanthan, because they are present in very low amounts and do not accumulate in GT mutants. 6 A simple way of assessing the effect of GumK mutations in vivo is to measure the amount of polysaccharide produced in complementation assays with a XcK (gumK Ϫ ) mutant. This kind of analysis provides a powerful means of detecting minor levels of activity in GumK mutants that may have been undetected in previous in vitro assays (14,39). Fig. 6 shows the relative percentage of xanthan production in XcK expressing mutated GumK compared with XcK mutant complemented with the wild-type gumK gene (XcK/pBBRSK). It is worth noting that mutations K307A and Y292A, involved in the coordination of the negative charge of phosphates, show a marked effect with ϳ25% xanthan production compared with XcK/pBBRSK. Mutation of other residues responsible for the binding of the ribose (Q310A) or the uracil (E272A/D) show a lesser effect. Altogether, this result implies that mutations affecting the kinetic parameters of one of the enzymes in the bio-synthetic machinery of xanthan have a quantifiable effect on the entire polysaccharide production system. Given that xanthan is a key virulence factor for X. campestris (15,40), it is not surprising that the key contacts and catalytic residues are strongly conserved (Fig. 4).
Asp 157 Is a Key Residue in the GumK Catalytic Mechanism-To identify the catalytic residue, all acidic residues lying in the catalytic cleft that could act as the general base (Asp 157 , Glu 192 , Asp 207 , or Asp 234 ) were mutated. Mutations E192A, D207A, or D234A showed no effect on GumK activity both in vitro ( Table  2) and in vivo (data not shown). A very interesting result of the complementation experiments described above was the lack of activity of GumK mutants D157A/Q/N, indicated by the com-plete absence of xanthan production in strains XcK/ pBBRSKD157A, XcK/pBBRSKD157N, and XcK/ pBBRSKD157E (Fig. 6). This absence of activity was also verified in the in vitro assays ( Table 2). The lack of activity in Asp 157 mutants after replacement of the charge (Asp to Asn mutation) or the length of the side chain (Asp to Glu) in both in vivo and in vitro assays implicates Asp 157 as the catalytic residue. To check for potential folding errors, we crystallized and solved the structure of mutant D157A as apoprotein (Protein Data Bank code 3CUY; supplemental Table 1). The r.m.s. deviation for all residues between native GumK and mutant D157A is 0.28 Å, showing that the mutant structure has not suffered structural changes. Moreover, wild-type strain FC2 carrying plasmid pBBRSKD157A, pBBRSKD157N, or pBBRSKD157E showed a complete absence of xanthan production, indicating that the mutant proteins are capable of interfering with the normal xanthan biosynthetic machinery (supplemental Fig. 1). This result suggests the formation of a multienzyme complex or the modulation of GumK activity by oligomerization (41).
The position of Asp 157 is structurally equivalent to the position of Asp 100 in the ␤-glucosyltransferase BGT from T4 phage or of Glu 95 in ␣ 1-3-fucosyltransferase FucT from Helicobacter pylori, acidic residues that are responsible for the deprotonation of the acceptor substrate (28,42). The N terminus of GumK displays a deep pocket or tunnel of 560 Å 2 , defined by the loops connecting N␤1 to N␣1 (residues 22-30) and N␤2 to N␣2 (residues 51-55). Basic and aromatic residues, such as Arg, Lys, Tyr, and Phe, are present at the boundaries of this tunnel, in line with the general features of carbohydrate-binding motifs (43,44) (see the CAZy site on the World Wide Web).
A model of the binding of the acceptor Man-Cel-P-P-lipid can be constructed based on the location of the catalytic base and the donor substrate, as well as the shape and orientation of the active site cleft. The shape of PP-Cel-Man is complemen-   tary to the shape of the cleft (Fig. 5B). Asp 157 is located immediately below this N-terminal hydrophobic pocket, where the glycolipid acceptor could be accommodated. The carboxylate of Asp 157 could deprotonate the 2-OH of the mannose residue. The side chain of Asp 157 is also positioned immediately adjacent to the putative location of the anomeric carbon of glucuronic acid in the GumK-UDP complex. Upon deprotonation of the C2-OH group by Asp-157, the acceptor nucleophile can attack the anomeric position of UDP-GlcA to form a new glycosidic bond with an inverted configuration (Fig. 7). The side product UDP dissociates at the same time. According to this hypothesis, the anomeric carbon is located between the acceptor nucleophile, the C2-OH of mannose, and the leaving group UDP, the geometry that is consistent with an in-line displacement mechanism (45). Indirect evidence supporting this hypothesis is that GumK showed hydrolytic activity toward UDP-GlcA after a 1-h incubation in the absence of acceptor. Under the same conditions, mutant D157A was unable to hydrolyze UDP-GlcA, even after a 24-h incubation (supplemental Fig. 2). In an attempt to find the GlcA portion of bound UDP-GlcA, we performed soaking experiments with crystals of the D157A mutant in the presence of UDP-GlcA. The position of the UDP ligand was exactly the same as in wild type GumK (Protein Data Bank code 3CV3). Unfortunately, the position of the GlcA moiety was not observed (data not shown), suggesting that the molecular motion of this portion of the molecule in the "open" conformation of GumK does not allow seeing it with crystallographic methods. RMN experiments will be carried out in the future to study this point.