Structure of UDP Complex of UDP-galactose:β-Galactoside-α-1,3-galactosyltransferase at 1.53-Å Resolution Reveals a Conformational Change in the Catalytically Important C Terminus*

UDP-galactose:β-galactosyl α-1,3-galactosyltransferase (α3GT) catalyzes the transfer of galactose from UDP-α-d-galactose into an α-1,3 linkage with β-galactosyl groups in glycoconjugates. The enzyme is expressed in many mammalian species but is absent from humans, apes, and old world monkeys as a result of the mutational inactivation of the gene; in humans, a large fraction of natural antibodies are directed against its product, the α-galactose epitope. α3GT is a member of a family of metal-dependent retaining glycosyltransferases including the histo-blood group A and B synthases. A crystal structure of the catalytic domain of α3GT was recently reported (Gastinel, L. N., Bignon, C., Misra, A. K., Hindsgaul, O., Shaper, J. H., and Joziasse, D. H. (2001) EMBO J. 20, 638–649). However, because of the limited resolution (2.3 Å) and high mobility of the atoms (as indicated by high B-factors) this structure (form I) does not provide a clear depiction of the catalytic site of the enzyme. Here we report a new, highly ordered structure for the catalytic domain of α3GT at 1.53-Å resolution (form II). This provides a more accurate picture of the details of the catalytic site that includes a bound UDP molecule and a Mn2+ cofactor. Significantly, in the new structure, the C-terminal segment (residues 358–368) adopts a very different, highly structured conformation and appears to form part of the active site. The properties of an Arg-365 to Lys mutant indicate that this region is important for catalysis, possibly reflecting its role in a donor substrate-induced conformational change.

Specific hetero-oligosaccharides on glycoproteins and glycolipids play important roles in cell-cell and cell-matrix interactions, affect the stability and structure of proteins, and modulate cellular interactions with viruses, toxins, and other proteins; they are also epitopes that are recognized by the immune system (1). The range and types of carbohydrate structures present on a cell vary in different tissues and species as a reflection of the specificity of glycosyltransferases, enzymes that catalyze the transfer of a specific monosaccharide from an activated derivative (such as UDP-galactose) into a defined linkage with a specific acceptor (2,3). The carbohydrate chains of glycoconjugates have an enormous potential for variation because of the range of different sugars and the large number of alternative glycosidic linkages. Consequently, they are carriers of large amounts of biological information that originates from the specificity of glycosyltransferases. In addition to their biosynthetic roles, some glycosyltransferases also function directly in cellular interactions and regulation (4,5). The limited information currently available regarding the structural basis of molecular recognition and catalysis by these important enzymes precludes a clear understanding of the molecular basis of their various biological functions. High resolution structural data are also needed to inform the design of inhibitors and to facilitate the engineering of new catalysts for the enzymatic synthesis of natural and novel glycoconjugates for therapeutic uses (4,6,7).
The majority of glycosyltransferases are type-II membrane proteins with short N-terminal cytosolic domains, a membranespanning region, a stem, and a C-terminal catalytic region (3). Classification schemes have been proposed for them based on sequence similarity and specificity; however, localized similarities between the sequences of different galactosyltransferases (8), mannosyl, and other glycosyltransferases (9,10) suggest that such schemes may eventually be simplified into larger groups with similar folds and conserved binding sites (see also Ref. 11). A priori mechanistic considerations divide glycosyltransferases into two large groups, those that catalyze a reaction in which the anomeric configuration of the transferred sugar is inverted and those that catalyze a retaining reaction (12). While the former are likely to act through a displacement (S N 2) mechanism that necessarily produces an inversion at C-1, a retaining reaction is expected to involve a double displacement mechanism with formation of an intermediate such as a ␤-glycosyl-enzyme covalent complex.
UDP-galactose ␤-galactoside ␣-1,3 galactosyltransferase (␣3GT 1 ; EC 2.4.1.151) is an enzyme found in many mammalian species but not in humans and their closest relatives because of the mutational inactivation of its gene (13); in species lacking the enzyme, about 1% of circulating endogenous antibodies are directed against the product of its action, the ␣-galactose epitope (14). These protect against pathogens but are a barrier to the xenotransplantation of organs from species with active ␣3GT to humans. ␣3GT is a model for several paralogous retaining glycosyltransferases of varying substrate specificity FIG. 1. a, structure of ␣3GT with bound UDP and Mn 2ϩ ion. The bound ligand and ion identify the location of the active site. The Mn 2ϩ ion is shown as a magenta sphere, UDP is brown, and helices are pink, while the strands are green. This image was created using the program MOLSCRIPT (38). b, the amino acid sequence of the catalytic domain of ␣3GT with all secondary structure elements highlighted. UDP binding residues are marked in yellow, while the Mn 2ϩ binding residues are shown by closed magenta spheres. This image was created using the program ALSCRIPT (39). c, stereoview comparison of the C ␣ atoms of form-II ␣3GT (present structure, in red) with the previously determined form I ␣3GT structure (Ref. 18; in black). The C-terminal residues 358 -368 in form II show a large difference in conformation and form a lid for the active site tunnel. This image was created using the program BOBSCRIPT (40).
A crystal structure of the catalytic domain of ␣3GT (in tetragonal form, P4 1 2 1 2 space group, one molecule/asymmetric unit, form I) was recently reported at 2.3-and 2.5-Å resolution in UMP-and Hg-UDP-galactose-bound forms, respectively (18). These structures identified binding sites for UDP and a Mn 2ϩ cofactor. A region of electron density in the Hg-UDPgalactose-bound structure was interpreted as a ␤-galactosyl moiety covalently attached to Glu-317, suggesting a covalent catalytic mechanism. However, no direct evidence was obtained for a glycosyl-enzyme covalent bond, and the limited resolution of the structure, reflected also in the high B-factors for all atoms and the disordered C-terminal region of the polypeptide chain, raises questions about this interpretation. To clarify unresolved issues regarding the structure of ␣3GT and the specific roles of different amino acid residues in the reaction mechanism, we sought to obtain crystals with improved resolution using different crystallization conditions. Here we report the structure of a new crystal form of ␣3GT obtained with polyethylene glycol as the precipitating agent instead of a high salt concentration buffer (pH 6.0, monoclinic form, P2 1 space group, dimer/asymmetric unit, form II) at 1.53 Å. The entire structure is highly ordered, with the C terminus (residues 358 -368) adopting what appears to be an "active" conformation in the structure. A possibly analogous conformational change in a region of other glycosyltransferases that are unrelated in amino acid sequence to ␣3GT (12), including ␤-4galactosyltransferase-I (19), has been noted that may have an important role in the catalytic mechanism (12). Mutagenesis of ␣3GT indicates that this region is important for catalysis. We discuss the implications of our results for the mechanism of ␣3GT and its homologues.

EXPERIMENTAL PROCEDURES
Protein Purification, Crystallization, and Data Collection-The catalytic domain of bovine ␣3GT (residues 80 -368) was expressed in Escherichia coli, purified as previously described (20), and stored at Ϫ20°C in 20 mM MES-NaOH buffer (pH 6) in 50% glycerol. Crystals were grown at 16°C by the vapor diffusion hanging drop method by mixing 2 l of the protein at 5 mg/ml in 20 mM MES-NaOH buffer, pH 6.0, 10% glycerol, containing 10 mM UDP and 0.1 mM MnCl 2 , with an equal volume of a reservoir solution containing 5% polyethylene glycol 6000 and 0.1 M Tris-HCl, pH 8.0. Single crystals appeared after 2-3 days. Before data collection, crystals were flash-cooled at 100 K in a cryoprotectant containing 10% polyethylene glycol 6000, 0.1 M Tris-HCl, pH 8.0, and 25% glycerol. A high resolution data set to 1.53 Å was collected using a 30-cm MAR research image plate at DESY, EMBL outstation (Hamburg, Germany). The crystals belong to the P2 1 space group, with two molecules (a noncrystallographic dimer) in the asymmetric unit and some 58% of the crystal volume occupied by the solvent. Raw data images were indexed and scaled using the DENZO and SCALEPACK modules of the HKL Suite (21) ( Table I).
Structure Determination and Refinement-The structure of ␣3GT dimer was determined by the molecular replacement method using the 2.5-Å tetragonal (form I, monomer) structure (18) with the program AMoRe (22). The structure was initially refined using the program CNS (23), with temperature factors for all atoms kept isotropic. The behavior of the cross-validation R-factor (R free ) was monitored throughout the refinement (24). Several rounds of energy minimization, individual B-factor refinement, simulated annealing using CNS, and model building using the program O (25) were performed until convergence of the R free value. The water molecules were picked using the program ARP/ wARP (26). These water molecules were manually inspected carefully with the aid of F o Ϫ F c and 2F o Ϫ F c difference electron density maps and accepted only if peaks existed in both the maps at the 3 and 1 level, respectively, and were at hydrogen bonding distance from the appropriate atoms. The model during CNS refinement converged to a crystallographic R-factor (R cryst ) of 18.4 and R free of 20.0%. Further refinement was carried out using SHELXL-97 (27). CGLS refinement in SHELXL-97 was carried out restraining all of the 1,2 and 1,3 distances with the Engh and Huber (28) restraints. Initially, all of the atomic displacement parameters were kept isotropic. The data to parameter ratio Ͼ2 enabled us to carry out anisotropic refinement on atomic displacement parameters, which was subsequently justified by a 1.3% drop in R free and an improved Fourier map. All of the alternate conformations were modeled after the initial anisotropic refinement. Any new atoms added to the molecule were refined isotropically for at least two cycles before they were refined anisotropically. The multiple conformation site occupation factors were refined constraining their sum to be unity. The model converged to an R cryst /R free of 14.82/20.09%. The final refinement was carried out with hydrogens included in the calculated positions (for protein atoms alone, except for multiple conformations). The addition of hydrogen atoms as riding model was justified by a drop in R free of 1.03%, leading to a final model with R cryst /R free of 14.05/ 19.06%. All nonhydrogen atoms were refined anisotropically including water molecules (excluding protein atoms with multiple conforma-FIG. 2. a, schematic figure showing the main hydrogen bond interactions between UDP and ␣3GT residues at the catalytic site of the enzyme. The Mn 2ϩ ion and water molecules are also shown. This image was created using the program MOLSCRIPT (38) and rendered using Raster3D (41). b, the location of UDP molecule in the active site tunnel. This image was created using the program DINO (A. Philippsen; available on the World Wide Web at www.dino3d.org).
tions). The final model of ␣3GT comprises residues 82-368 for both molecules and contains one UDP molecule and one Mn 2ϩ ion per monomer at the active site. The structure contains 757 water molecules and two glycerol molecules (from the crystallization medium or cryoprotectant). All of the residues of the dimer lie in allowed regions of the Ramachandran (-) map.
Construction, Expression, and Characterization of the Arg-365 to Lys Mutant-The expression vector for wild type ␣3GT, pET15b-␣3GT, was constructed as described previously (20). The mutant coding sequence was generated by amplification of pET15b-␣3GT using a T7 promoter and the mutagenic primer,

SEQUENCE 1
The amplification product was cleaved with XbaI and BamHI and cloned into a pET42b vector that had been previously treated with the same enzymes. The mutant was characterized by automated DNA sequence analysis of the entire coding sequence. The enzyme was expressed as described for wild-type ␣3GT (20). Steady state kinetic studies were carried out as described previously (20) using a radiochemical assay; enzyme activity was measured at varying concentrations of lactose (acceptor substrate) and a series of fixed concentrations of UDP-galactose, and the data were analyzed by fitting to the equation, using the Curvefitter program of SigmaPlot™.
[A] and [B] represent the concentrations of UDP-galactose and lactose, respectively. The Arg-365 to Lys mutant was much less active than wild type enzyme and was assayed at a concentration of 92 g/ml, as compared with 4.6 g/ml for ␣3GT.
Near and far UV CD spectra of the mutant enzyme were determined with a JASCO J-710/720 spectropolarimeter as described (20) using protein (0.5 mg/ml) dissolved in 20 mM Tris-HCl buffer, pH 7.4, containing 50% glycerol.

RESULTS AND DISCUSSION
Overall Structure-The structures of the two ␣3GT monomers are virtually identical to one another with a root mean square deviation of 0.14 Å over all C ␣ atoms. Alternate conformations are found for Glu-145, Phe-184, Val-229, Glu-241, and Trp-249 in molecule A and Glu-145, Phe-184, Glu-241, and Glu-360 in molecule B. The overall structure of ␣3GT-form II (dimer) is similar to the previously reported structure of ␣3GT (form I, monomer) except at the catalytic site (Fig. 1, a and c). Briefly, the molecule exhibits a ␣/␤ fold and encompasses a central region with a "Rossmann fold" similar to those found in nucleotide binding domains (29). The active site is identified as a deep tunnel inside the molecule based on the presence of a bound UDP molecule and a Mn 2ϩ ion (Fig. 2, a and b). It contains several regions of sequence that are conserved between different homologues of ␣3GT. If the C-terminal 10 residues are excluded, the form II structure (residues 82-358) superimposes closely with that of form I, exhibiting a root mean square deviation of 0.31 Å for backbone atoms and 0.92 Å for all atoms. However, the C-terminal region comprising residues 358 -368, which is highly disordered in form I (18), has undergone a large positional change in form II. The mean root mean square deviation for the C ␣ atoms for the two structures in this segment is 15.2 Å with a maximum of 21 Å for Arg-365 and Asn-366. This stretch of the molecule is also highly ordered in form II as evidenced by the B-factors; in general, both molecules in form II are well ordered, reflecting the high resolution and much lower mobility for all residues (Table I). In form II, the C terminus forms a lid to the active site of the molecule (Fig. 1c) so that the large change in structure between the two forms is associated with reduced active site accessibility.
The Catalytic Site-The UDP binding domain is located between a twisted central ␤-sheet (␤ 2 -␤ 5 ), two long ␣-helices (␣ 3 and ␣ 4 ), and the C-terminal region of the molecule (Figs. 1 (a  and b) and 2 (a and b)). The new structure shows highly ordered, clear binding of UDP molecule through the conserved DVD motif (9) and a Mn 2ϩ ion. Some interactions of the UDP moiety with ␣3GT are similar to those in form I (Fig. 2, Table  II) and involve residues that are located at the end of strands ␤ 2 and ␤ 5 , helix ␣ 4 , and the D 225 VD 227 motif (Fig. 1B). In addition, in the present structure (form II), the C-terminal region (␤ 10 and ␣ 7 ) also constitutes part of the active site and makes direct interaction with the UDP phosphates (Fig. 2, Table II) (see below). The C terminus (residues 358 -368) that is shown by the present structure to be important for UDP binding is also highly conserved in all currently known ␣3GT amino acid sequences (30). Mn 2ϩ is required for catalysis by many UDP-sugar-utilizing glycosyltransferases, and activity has been shown to be dependent on two metal ions in both ␤-4-galactosyltransferase-I (31) and ␣3GT (20). In the ␣3GT (both forms I and II) structure, one Mn 2ϩ ion was identified. This Mn 2ϩ forms an octahedral coordination through interactions with two oxygen atoms from the ␣and ␤-phosphates of the UDP molecule, a single interaction with the OD2 atom of Asp-225, bidentate coordination with OD1 and OD2 atoms of Asp-227, and an interaction with a water molecule (Table II). The Mn 2ϩ ion appears to stabilize the DVD sequence motif and binds the diphosphate moiety of the UDP molecule. Previous studies have shown that mutants of ␣3GT in which either aspartate of the motif is changed to asparagine have undetectable catalytic activity, in keeping with its key role in Mn 2ϩ and donor substrate binding and catalysis (20). The location of the binding site for the second Mn 2ϩ cofactor in our structure was not identified.
The C Terminus-The last 11 residues at the C terminus (residues 358 -368) adopt distinct conformations in the form I and form II structures (Fig. 1c). In the form I structure, this loop is disordered, with high B-factors of about 90 Å 2 , suggesting more than one conformation. In the present form II structure, residues 354 -356 form a short ␤-strand, and residues 361-364 adopt a definite ␣-helical structure and are well defined in the electron density map. The conformational change observed in the present structure is associated with the formation of hydrogen bonds and van der Waals contacts between the ␣and ␤-phosphates of UDP and Lys-359, Tyr-361, and Arg-365 and increased rigidity of all residues in this region (Fig. 2, Table II). The average B-factors for the above contact residues are 15.0, 14.9, and 16.2 Å 2 , respectively, significantly less than the average B-factor for the overall structure of 17.05 Å 2 .
Properties of ␣3GT R365K -The conformational change in the C terminus between the two structures, the sequence conservation among other ␣3GT, and analogous transconformations in other glycosyltransferases (12) all suggest that this region could have a key role in the catalytic action. To test this, we constructed and characterized a mutant enzyme with a structurally conservative substitution, Lys for Arg-365, a residue in this region that is conserved in all homologues of ␣3GT. R365K was constructed as described and expressed as soluble protein in good yield (11 mg/liter of cell culture). Both far and near UV CD spectra of this mutant are closely similar to those of the wild type enzyme (data not shown), indicating that the mutation does not introduce any global conformational change. The kinetic parameters of this mutant determined at 10 mM Mn 2ϩ are summarized in Table III. Compared with wild type, k cat of the mutant is reduced 38-fold, while parameters associated with the binding of donor and acceptor substrate showed insignificant or minor changes. The catalytic efficiency (k cat /K ia ϫ K b ) is also reduced 44-fold, reflecting the reduction in k cat ; thus, this highly conservative substitution specifically reduces the stabilization of the transition state by the enzyme with essentially no effect on substrate binding in the ␣3GT⅐Mn 2ϩ ⅐UDP-galactose⅐lactose complex.
In the present structure, Arg-365 directly interacts with the ␣-phosphate of UDP and the OH atom of Tyr-139, which in turn interacts with the ␣-phosphate of UDP and makes stacking interactions against the uracil ring (Table II, Fig. 2). Furthermore, Arg-365 is involved in van der Waals interactions with UDP, Tyr-361, and Trp-195. Based on modeling studies, we predict that Arg-365 Lys substitution would not provide all of the interactions observed in the present structure.
Similarities have been noted in the mode of interaction with UDP (or UDP-sugar), particularly the interaction of the uridine moiety with the ␤-sheet and the phosphates with Mn 2ϩ and the aspartate cluster (12). The presence of a contiguous flexible loop that undergoes a conformational change to a more rigid structure on UDP or donor substrate binding also appears to be a common feature (12), although the structure and location of this region differs in the various transferases. For example, in ␤-4 galactosyltransferase-I, a flexible loop interacts with UDP in an analogous fashion to that of ␣3GT upon binding to ␣-lactalbumin, the modulator protein of ␤-4 galactosyltransferase-I, in the presence of glucose (19). It is interesting to note that in FIG. 3. Structural comparison of the UDP binding domain of ␣3GT with other representative glycosyltransferases. The structural alignment was performed using the combinatorial extension method (42). Superimposed secondary structure elements forming part of the UDP binding domain are labeled according to ␣3GT assignment as shown in Fig. 1. Manganese ions are shown as magenta spheres. A, ␣3GT (form II, present structure). B, ␤-4-galactosyltransferase-I (19). C, retaining galactosyltransferase, LgtC from N. meningitidis, Protein Data Bank code 1GA8 (34). D, nucleotide-diphosphosugar transferase, SpsA, from B. subtilis, Protein Data Bank code 1QGQ (35). E, human glucoronyltransferase, Protein Data Bank code 1FGG (33). F, rabbit N-acetylglucosamine transferase, Protein Data Bank code 1FOA (11). In human glucoronyltransferase, the brown sphere represents an unknown metal ion, while in SpsA, the brown sphere represents a Mg 2ϩ ion. All figures were generated with the program MOLSCRIPT (38). Residues interacting with the diphosphate group of UDP and/or the Mn 2ϩ ion are labeled in all of the structures.
␣3GT and the only other structurally characterized retaining glycosyltransferase, LgtC, the UDP (UDP component of UDP-2-F-galactose in LgtC) is buried in the enzyme complex. It has been suggested that in LgtC, the C-terminal loop would adopt an alternate conformation in the absence of UDP (34). This may reflect a need to protect a reactive intermediate in catalysis from solvent in retaining transferases. However, it has also been suggested that the flexible region in both retaining and inverting glycosyltransferases has a role in product release (12).
Conclusions-The new crystal form of ␣3GT described here provides the structure of the UDP complex at far higher resolution than those reported previously for a UMP complex and a Hg-UDP-galactose complex (18); the more ordered structure is also highlighted by the much lower B-factors for all atoms in the structure. A major conformational change relative to the previous structures results in greater order in the C-terminal 10 residues and new enzyme-ligand contacts in this region. This may reflect, in part, the different uridine nucleotides present in the complex. Previous work suggests that the reaction mechanism of ␣3GT may be ordered with donor substrate binding preceding acceptor binding, but conclusive proof of this is lacking (see Zhang et al. (20) for a discussion). In this context, it is possible that the observed conformational change could be linked to the formation of the binding site for the acceptor substrate. Previously, Henion et al. (36) found that deletion of as few as 3 amino acid residues from the C terminus of a primate ␣3GT results in complete loss of catalytic activity. This supports the view that the C terminus is crucial for catalysis, although the physical properties of the truncated enzyme were not characterized. Here we show that the substitution of lysine for the highly conserved Arg-365 does not affect substrate binding but specifically reduces the stability of the transition state in the reaction. This is distinct from the effects of a previously described mutation, Val-226 3 Ala, within the D 225 VD 227 motif, which perturbs metal cofactor and UDP-galactose binding as well as catalytic efficiency (20), in keeping with the role of this region in interacting with phosphates of UDP and metal ion. Additional work is needed to fully understand the role of the flexible C terminus in the reaction mechanism. The catalytic mechanism of ␣3GT is currently unknown. Although the formation of a glycosyl-enzyme intermediate is a plausible mechanism for a retaining glycosyltransferase, this type of mechanism is difficult to reconcile with steady state kinetic studies that indicate a sequential mechanism in which all substrates bind prior to catalysis (20). Interestingly, structural and mutational studies of the LgtC also do not support the formation of a covalent ␤-galactosyl-enzyme or even a ␤-galactosyl-substrate intermediate (34). The structural evidence for a covalent intermediate in ␣3GT is not compelling, and an S N i mechanism, as suggested for LgtC, is possible in which nucleophilic attack by the acceptor substrate occurs simultaneously with UDP release and on the same side of the galactose ring (34). The present high resolution crystal form and structure provide a platform for further direct structural studies to establish the binding modes of donor and acceptor substrates that may help to unravel the details of the mechanism of action of ␣3GT.