Heparan/Chondroitin Sulfate Biosynthesis

Human β1,3-glucuronyltransferase I (GlcAT-I) is a central enzyme in the initial steps of proteoglycan synthesis. GlcAT-I transfers a glucuronic acid moiety from the uridine diphosphate-glucuronic acid (UDP-GlcUA) to the common linkage region trisaccharide Galβ1–3Galβ1–4Xyl covalently bound to a Ser residue at the glycosaminylglycan attachment site of proteoglycans. We have now determined the crystal structure of GlcAT-1 at 2.3 Å in the presence of the donor substrate product UDP, the catalytic Mn2+ ion, and the acceptor substrate analog Galβ1–3Galβ1–4Xyl. The enzyme is a α/β protein with two subdomains that constitute the donor and acceptor substrate binding site. The active site residues lie in a cleft extending across both subdomains in which the trisaccharide molecule is oriented perpendicular to the UDP. Residues Glu227, Asp252, and Glu281 dictate the binding orientation of the terminal Gal-2 moiety. Residue Glu281 is in position to function as a catalytic base by deprotonating the incoming 3-hydroxyl group of the acceptor. The conserved DXD motif (Asp194, Asp195, Asp196) has direct interaction with the ribose of the UDP molecule as well as with the Mn2+ ion. The key residues involved in substrate binding and catalysis are conserved in the glucuronyltransferase family as well as other glycosyltransferases.

Proteoglycans with side chains such as heparan sulfate and chondroitin sulfate are distributed on the cell surface and in the extracellular matrix and are implicated in various biological processes including cell growth and differentiation, blood coagulation, and viral and bacterial infections. Mutations in the Drosophila homolog of UDP-glucose dehydrogenase, which produce UDP-GlcUA 1 required for GAG synthesis, result in impaired signaling of Wingless, fibroblast growth factor, and Hedgehog (1)(2). More specifically, defects in heparan sulfate synthesis often result in severe biological consequences. Mutations in N-deacetylase/N-sulfotransferase (NDST) also cause severe impairment of these signaling pathways (3). NDST-1null or NDST-2-null mice exhibit phenotypes with pulmonary hypoplasia or abnormal mast cells (4 -6). Human EXT1 and EXT2 genes encoding heparan polymerases are linked to hereditary multiple exostoses (7), whereas heparan sulfate 2-O sulfotransferase-null mice die neonatally from renal agenesis (8).

MATERIALS AND METHODS
Protein Expression, Purification, and Enzyme Assay-The coding region of the catalytic domain of GlcAT-I was amplified from human liver cDNA by polymerase chain reaction. The expressed protein contained the following sequence: MGSSHHHHHHSSGLVPRGSHMT 76 -V 335 . Sequence analysis revealed the liver enzyme to differ from the placenta enzyme by one substitution Phe at position 204 instead of Ser. The amplified DNA was inserted into the bacterial expression plasmid PET-28a. The PET-28a plasmid was then transformed into BL21(DE3) cells. To express selenomethionyl GlcAT-I, B834(DE3) cells were used. Expressed protein was purified using Ni 2ϩ -agarose and eluted with a gradient of imidazole. The eluted fractions containing GlcAT-1 were pooled, dialyzed against 25 mM HEPES, pH 7.5, and 50 mM NaCl and concentrated to 30.8 mg of protein/ml. For enzyme assay, UDP-[ 14 C] GlcUA (320 mCi/mmol) was obtained from American Radiolabeled Chemicals, Inc. Unlabeled UDP-GlcUA and ATP were obtained from Sigma. The glucuronyltransferase activity was determined as described previously using Gal␤1-3Gal␤1-4Xyl or asialoorosomucoid (Gal␤1-4GlcNAc-R, R representing the remainder of the N-linked oligosaccharide chain) as substrate (14). The substrate Gal␤1-3Gal␤1-4Xyl was from Dr. N. B. Schwartz (University of Chicago).
Crystallization and Data Collection-Crystals of GlcAT-I were obtained by the vapor diffusion hanging drop method. 4 l of 15 mg/ml * The work at Kobe Pharmaceutical University was supported by grants from the Ministry of Education, Science, Sports, and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
R free is calculated from 5% data randomly chosen not to be included in refinement.
6.0, 10 mM UDP, and ϳ20 mM Gal␤1-3Gal␤1-4Xyl. Data for the selenomethionine GlcAT-I crystal were collected on Beamline X9B at Brookhaven National Laboratories. Phases for the electron density map were obtained from data collected at a single wavelength equal to 0.97892 Å representing the peak of the anomalous dispersion. All data was processed using DENZO and SCALEPACK (15).
Positions for the selenium atoms were defined using SHELX (16). Phases were calculated in MLPHARE (17) and then improved using DM (17). WARP (18) was employed to fit approximately half the backbone and several side chains, to the electron density. Following multiple rounds of model building in O (19) and refinement in CNS version 0.5 (20), a final R factor of 21.7% and R free of 24.6% was obtained at 1.6 Å resolution (Table I) To determine the structure of the ternary complex, one molecule from the acceptor-unbound structure was used as a search model in molecular replacement using the program AMoRe (17)

RESULTS AND DISCUSSION
To crystallize GlcAT-I, the N-terminal 75 residues including the proposed cytoplasmic, transmembrane, and stem regions were removed. The truncated GlcAT-I-(Thr 76 -Val 335 ) was expressed in E. coli. The GlcAT-I-(Thr 76 -Val 335 ) protein and activity co-eluted as a large single peak with an apparent molecular mass of 43 kDa from the gel filtration column (Fig. 1). The GlcAT-I-(Thr 76 -Val 335 ) did not exhibit activity toward asialoorosomucoid, the specific acceptor substrate of GlcAT-P (data not shown). Thus, the recombinant enzyme lacking the Nterminal region appeared to be monomeric in solution and retained significant specific GlcAT-I activity.
The present crystal structure has revealed that GlcAT-I-(Thr 76 -Val 335 is approximately 40 ϫ 40 ϫ 50 Å 3 in size with an extended C terminus ( Fig. 2A). The enzyme contains a sevenstranded mixed ␤-sheet, which can be divided into two subdomains. The active site is found in a cleft of the molecule that extends across both subdomains. The N-terminal subdomain is an ␣/␤ motif with alternating ␤-strands and ␣ helices. The ␤-strands form a parallel ␤-sheet with strand order 3, 2, 1, and 4. This subdomain contains the majority of the residues associated with donor substrate binding. The C-terminal subdomain, which contains the acceptor substrate binding site, includes ␤-strands 9, 6, and 10 that form a continuous ␤-sheet with strands 3, 2, 1, and 4 from the N-terminal subdomain. The C-terminal subdomain is largely a mixed ␤-sheet with strand order 5, 11, 6, 7, and 8 followed by the C terminus of the molecule extending away from the core along another molecule in the crystal lattice.
Crystals grown from both phosphate and MME-PEG contain the same dimer in the asymmetric unit (Fig. 2, B and C). The overall buried surface area for the dimer is 4000 Å 2 , represent- Type II Golgi membrane enzymes such as ␤1,4-galactosyltransferase have been reported to be homodimers (21,22). Thus, GlcAT-I, also a type II membrane-associated protein, is likely to be a homodimer in the Golgi membrane.
The UDP molecule binds across a long cleft on the surface of the molecule with the uridine and ribose rings mainly centered in the N-terminal subdomain of the molecule. The OD2 oxygen of D113 from strand 3 (Fig. 3) is within hydrogen bonding distance of atom N-3 (2.9 Å) of the uridine ring. The side chain of Tyr 84 in ␤-strand 1 is found in a parallel ring stacking arrangement with the uridine base about 3.5 Å away. Atoms O3* and O2* of the ribose ring are also involved in protein interactions. O3* is 3.0 Å from atom N of Asp 195 and 2.5 Å from O of Pro 82 . O2* is 2.8 Å from OD1 of Asp 195 . Residues Arg 156 and Arg 310 form direct interactions with the phosphate groups: atom NH1 of Arg 156 is found 3.2 Å from O2B of the ␣ phosphate, whereas atom NH1 of Arg 310 is located 2.6 Å from O1B of the ␤ phosphate.
The Mn 2ϩ atom is in an approximately octahedral coordination state in which two of the coordination atoms, O1B (2.1 Å) and O2A (2.1 Å), are from the ␤and ␣-phosphates of the UDP molecule. Asp 196 forms a bidentate interaction through OD1 (2.2 Å) and OD2 (2.2 Å). The remaining two ligands are from two water molecules. One water molecule is found 2.3 Å from the Mn 2ϩ and is also hydrogen bound to OD1 of Asn 197 (2.8 Å) and O of Thr 309 (2.8 Å). The other water molecule is found 2.1 Å from the Mn 2ϩ and 2.6 Å from OD1 of Asp 194 . Thus, residues Asp 194 , Asp 195 , and Asp 196 of the conserved DXD motif (23), are involved in ribose interactions as well as direct and indirect interactions with the Mn 2ϩ ion. A second unidentified metal ion is found in a tetrahedral geometry with ligands O3B of UDP (1.9 Å), NE2 of His 308 (2.0 Å), and two possible water molecules (2.0 Å and 2.2 Å). However, it is less certain whether this metal ion is physiological. This metal ion is not present in either the unbound or the UDP/Mn 2ϩ /no acceptor substrate structures (data not shown), suggesting that the metal may have come from the trisaccharide solution.
Residues involved in acceptor substrate binding are isolated to the C-terminal subdomain of the GlcAT-I molecule. The trisaccharide Gal␤1-3Gal␤1-4Xyl binds with the O3 of the terminal galactose (Gal-2) 5.1 Å from the O3B atom of the UDP molecule (Fig. 3). Electron density exists only for the Gal␤1-3Gal portion of the trisaccharide Gal␤1-3Gal␤1-4Xyl. Atom OE1 of Glu 227 is within hydrogen bonding distance to O-6 of Gal-2 (2.5 Å). NH1 of Arg 247 is also found 2.8 Å from O-6. The O-4 oxygen of Gal-2 is found 2.6 Å from OD2 of Asp 252 and O-3 of Gal-2 is 2.7 Å from OE2 of Glu 281 . The Gal-1 moiety is positioned near the surface of the molecule. Residue Gln 318 of the second monomer is in position to form a hydrogen bond with O-6 of Gal-1. Atoms OE1 and NE2 are positioned 2.6 Å and 3.0 Å, respectively from O-6 of Gal-1. Although Trp 243 does not appear to form hydrogen bonds with the acceptor substrate, the plane of the side chain is parallel to the ring of the Gal-1 molecule.
Based on the present binding orientation of the acceptor, the Xyl moiety would be expected to reside outside the substratebinding cavity. However, no density was observed for the Xyl moiety of the trisaccharide. Xyl may simply act as a spacer between the Gal␤1-3Gal disaccharide in the substrate cavity, and the protein to which the linker region is attached and consequently would have no specific interactions with the enzyme. The majority of the hydrogen bond interactions between GlcAT-I and the acceptor substrate are through Gal-2. The three-dimensional orientation of residues Glu 227 , Asp 252 , and Glu 281 in the active site may dictate the specificity for the acceptor substrate through interactions with the O-6, O-4, and O-3 hydroxyl groups of Gal-2. These three residues are highly conserved in all glucuronyltransferases.
The reaction catalyzed by GlcAT-I is the transfer of glucuronic acid from UDP-GlcUA to the O-3 oxygen of the terminal Gal-2 of the growing linkage region Gal␤1-3Gal␤1-4Xyl␤1-O-Ser. A reasonable mechanism involves attack by a deprotonated form of the incoming 3-hydroxyl group of the acceptor oligosaccharide, on the C-1 position of the glucuronic acid (Fig.  4). Formation of a GlcUA␤1-3Gal linkage and subsequent dissociation of the UDP molecule follows. This reaction would require a catalytic base to deprotonate the hydroxyl group at the C-3 position of Gal-2. Atom OE2 of Glu 281 is located 2.7 Å from the nucleophilic O-3 of the 3-hydroxyl group. This suggests that Glu 281 could play a key role as the catalytic base in the transfer reaction. The histidine at position 308 is found located near the ␤-phosphate of UDP and the O-3 hydroxyl of Gal-2. His 308 may interact with the glucuronic acid portion of the UDP-GlcUA molecule either by orienting the glucuronic acid for catalysis or by stabilizing the transition state. Both Glu 281 and His 308 are totally conserved in all glucuronyltransferases. The present GlcAT-I structure supports the proposed catalytic mechanism for NDP-sugar dependent glycosyltransferases (24).
To date three crystal structures of NDP-sugar dependentglycosyltransferases have been reported: T4 phage ␤-glucosyltransferase (25), bovine ␤-4-galactosyltransferase (26) and SpsA from Bacillus subtilis (17). However, none of these structures have acceptor substrate bound. The structure of GlcAT-I shows high similarities in the tertiary fold and NDP binding with SpsA, despite only a 7.3% sequence identity (Fig. 5A). Insightfully, the catalytic base Glu 281 of GlcAT-I superimposes with Asp 191 of the SpsA structure, which has been proposed as the catalytic base in SpsA (27) (Fig. 5B). In addition, Asp 252 of GlcAT-I that interacts with O-4 of Gal-2, superimposes well with Asp 158 in the SpsA structure. This data suggest that the reaction mechanism and the catalytic site structure may also be applicable to other families of glycosyltransferases.
Critical roles of GAGs, especially heparan sulfate, in developmental processes and specific signaling pathways have recently been demonstrated by the identification of mutations in biosynthetic enzymes for heparan sulfate synthesis in the Drosophila, mouse, and human (28). The structural biology of GAGs and enzymes involved in GAG biosynthesis is evolving rapidly (29). The ternary structure of GlcAT-I is the first, not only for glucuronyltransferases but also for glycosyltransferases that are involved in heparan/heparin biosynthesis. The structure has revealed the basis for understanding glucuronic acid transfer to the linkage region at a branching point common to various GAGs. This structure, along with the crystal structure of the sulfotransferase domain of heparan sulfate N-deacetylase/N-sulfotransferase (30), opens a new era of structural biology of heparan/chondroitin sulfate biosynthesis and GAG biology.