Crystal structure of unsaturated glucuronyl hydrolase, responsible for the degradation of glycosaminoglycan, from Bacillus sp. GL1 at 1.8 A resolution.

Unsaturated glucuronyl hydrolase (UGL) is a novel glycosaminoglycan hydrolase that releases unsaturated d-glucuronic acid from oligosaccharides produced by polysaccharide lyases. The x-ray crystallographic structure of UGL from Bacillus sp. GL1 was first determined by multiple isomorphous replacement (mir) and refined at 1.8 A resolution with a final R-factor of 16.8% for 25 to 1.8 A resolution data. The refined UGL structure consists of 377 amino acid residues and 478 water molecules, four glycine molecules, two dithiothreitol (DTT) molecules, and one 2-methyl-2,4-pentanediol (MPD) molecule. UGL includes an alpha(6)/alpha(6)-barrel, whose structure is found in the six-hairpin enzyme superfamily of an alpha/alpha-toroidal fold. One side of the UGL alpha(6)/alpha(6)-barrel structure consists of long loops containing three short beta-sheets and contributes to the formation of a deep pocket. One glycine molecule and two DTT molecules surrounded by highly conserved amino acid residues in UGLs were found in the pocket, suggesting that catalytic and substrate-binding sites are located in this pocket. The overall UGL structure, with the exception of some loops, very much resembled that of the Bacillus subtilis hypothetical protein Yter, whose function is unknown and which exhibits little amino acid sequence identity with UGL. In the active pocket, residues possibly involved in substrate recognition and catalysis by UGL are conserved in UGLs and Yter. The most likely candidate catalytic residues for glycosyl hydrolysis are Asp(88) and Asp(149). This was supported by site-directed mutagenesis studies in Asp(88) and Asp(149).

Polysaccharides exist ubiquitously in nature as components of the extracellular matrix on the cell surface of many different organisms, ranging from bacteria to mammals (1). These polysaccharides are important to a variety of biological and functional activities, and are divided into three groups: i.e. storage, e.g. starch; structural, e.g. cellulose; and functional, e.g. glycosaminoglycan.
Glycosaminoglycans such as hyaluronan, chondroitin, and heparin are linear, negatively charged polysaccharides with a repeating disaccharide unit consisting of a uronic acid residue (glucuronic or iduronic acid) and an amino sugar residue (glucosamine or galactosamine) (2). Hyaluronan consists of D-glucuronic acid (GlcA) 1 and N-acetyl-D-glucosamine (GlcNAc) (Fig.  1a) and plays an important role in cell-to-cell association in mammals and as a capsule in streptococcal bacteria (3). This polysaccharide is widely present in such human tissues as the eye, brain, liver, skin, and blood (4). Chondroitin, also a member of the glycosaminoglycan family, consists of GlcA and Nacetyl-D-galactosamine (GalNAc) with a sulfate group(s) at position 4 or 6 or both (5) (Fig. 1b). Mammalian chondroitin covalently bound with proteins plays an important role in cellular architecture and permeability (5). These glycosaminoglycans in the extracellular matrix may be a target for pathogens that invade host cells, and many pathogens have been reported to show specific interaction with these polysaccharides (6).
Certain streptococci such as Streptococcus pyogenes and Streptococcus pneumoniae cause severe infectious disease, e.g. pneumonia, bacteremia, sinusitis, and meningitis, and produce polysaccharide lyases, which function as virulence factors in the degradation of hyaluronan and chondroitin (7,8). Lyases for haluronan and chondroitin recognize GlcA residues in polysaccharides and produce unsaturated disaccharides with a GlcA residue having a CϭC double bond at the nonreducing terminus through the ␤-elimination reaction. Unsaturated glucuronyl hydrolase (UGL) catalyzes the hydrolysis of unsaturated disaccharides to an amino sugar and an unsaturated GlcA (⌬GlcA), which is nonenzymatically converted immediately to ␣-keto acid ( Fig. 1) (9). The enzyme is thus thought to be another virulent factor responsible for the complete degradation of glycosaminoglycans. Although a gene coding for UGL was first cloned from Bacillus sp. GL1 that degrades bacterial biofilms such as xanthan, which is produced by pathogenic xanthomonads, and gellan, which is produced by pathogenic sphingomonads (Fig. 1, c and d) (10,11), highly homologous genes are distributed in pathogenic streptococci that produce polysaccharide lyases (12). We thus defined UGL as a novel member of hydrolase for the degradation of glycosaminoglycans and bacterial biofilms, and the enzyme be-longs to a new glycoside hydrolase family, GH-88, in the CAZY data base. 2 Inhibitors of polysaccharide lyases and UGL are expected to become potent pharmaceuticals for treating streptococci infectious disease.
Structural analysis of polysaccharide lyases and UGL is indispensable for clarifying molecular mechanisms underlying catalysis and recognition of substrates, and sequential reaction mechanisms involved in polysaccharide depolymerization by bacteria. The crystal structures of polysaccharide lyases such as those for pectate (13)(14)(15), alginate (16), hyaluronate (17), chondroitin (18,19), and xanthan (20) have been determined, and the structural and functional relationship of these lyases have been studied. No three-dimensional structure has been clarified for any UGL, however. This article deals with the crystal structure of UGL, a novel hydrolase for the degradation of glycosaminoglycans and bacterial biofilms, from Bacillus sp. GL1 determined by x-ray crystallography at 1.8 Å resolution and the identification of the active site. The structure provides useful information on the catalytic mechanism and for the molecular design of drugs for the treatment of streptococci, xanthomonad, and sphingomonad infectious diseases.

MATERIALS AND METHODS
Crystallization and X-ray Diffraction-UGL of Bacillus sp. GL1 was overexpressed in Escherichia coli, purified, and crystallized by sittingdrop vapor diffusion as described elsewhere (21). UGL crystals were soaked in a heavy atom derivative solution containing 1 mM of K 2 PtCl 4 , 1 mM of Hg(CH 3 COO) 2 , or 0.5 mM of NaAuCl 4 for 15-50 min at 20°C. These heavy atom solutions were prepared in 52% (v/v) 2-methyl-2,4pentanediol (MPD), 0.12 M of sodium chloride, 0.1 M of glycine, and 0.1 M of Tris-HCl buffer (pH 7.6). The crystals we used were removed from a droplet on a mounted nylon loop (Hampton Research, Laguna Niguel, CA), and placed in a cold nitrogen gas stream at 100 K. X-ray diffraction images of the UGL crystal (Native 1) were collected using a Quantum 4R CCD area detector (ADSC) with synchrotron radiation at a wave- length of 0.72 Å at the BL-38B1 station of SPring-8. Images were processed with DENZO and SCALEPACK software (22) to a resolution of 1.8 Å (Table I). Diffraction images of another crystal (Native 2) and derivative crystals for phasing were collected with a Bruker Hi-Star multiwire area detector using CuK␣ radiation generated by a MAC Science M18XHF rotating anode generator, and were processed with SADIE and SAINT software (Bruker, Karlsruhe, Germany).
Structure Determination and Refinement-The UGL crystal structure was determined by multiple isomorphous replacement (mir). Phase calculation and refinement were done with Native 2 and derivative data sets using a PHASES program (23). Major sites of heavy atoms were determined by the interpretation of difference Patterson maps calculated at a resolution of 6.0 -3.5 Å. Additional heavy atom sites were determined from difference Fourier maps. Phasing results are listed in Table II. The mean figure of merit was 0.426. The phase was greatly improved and the mean figure of merit increased to 0.754 after solvent flattening (24) with a PHASES program. Initial model building was done with Native 2 data sets and the phase at 3.5 Å using the TURBO-FRODO program (AFMB-CNRS, Marseille, France) on a Silicon Graphics Octane computer. Simulated annealing refinement was done with this model and 25-2.5 Å resolution data from Native 1 data sets with CNS ver. 1.1 (25). The model was heated to 3,000 K, then slowly cooled to 300 K (time step, 0.5 fs; decrease in temperature, 25 K; number of steps at each temperature, 50), and 200 cycles of Powell minimization were done. F o Ϫ F c and 2 F o Ϫ F c maps were used to locate the correct model. Several rounds of positional and B-factor refinement, followed by manual model building, were done to improve the model by increasing data to a resolution of 1.8 Å. Water molecules were incorporated where the difference density exceeded the mean by 3.0 or more and the 2 F o Ϫ F c map showed a density exceeding 1.0 . Seven fragments of nonprotein or nonwater density were modeled into four glycine molecules, two dithiothreitol (DTT) molecules, and one MPD molecule from the crystallization medium, and density was excellent for the whole molecule. The final R-factor was 16.8% for 63,316 data points with F Ͼ 2.0 (F) in a resolution of 25.0 -1.8 Å (96.7% completeness). The R-free value calculated for randomly separated 10% data was 18.9%. The stereoquality of the model was assessed using the PRO-CHECK (26) and WHAT-CHECK (27) programs. Structural similarity was searched for in the RCSB Protein Data Bank (28) using the DALI program (29). Coordinates of hypothetical protein Yter (1NC5) were taken from the RCSB Protein Data Bank. UGL and Yter models were superimposed by a fitting program in TURBO-FRODO. Ribbon plots were prepared using the MOLSCRIPT (30), RASTER3D (31), and GRASP (32) programs.
Enzyme Assay-UGL reactions for the wild-type and mutants were conducted at 30°C as follows: The reaction mixture consisted of 50 mM of sodium phosphate buffer (pH 6.5), 20 -500 M of substrate, and enzymes in a 500-l reaction volume. Enzyme activity was measured by monitoring the decrease in absorbance at 235 nm, corresponding to the loss of the CϭC double bond of the substrate because the pyranose ring of the released ⌬GlcA readily opens so that it is nonenzymatically  converted to ␣-keto acid through the loss of the double bond ( Fig. 1d) (7,9). Enzyme concentration was determined by UV spectrophotometry using theoretical molar extinction coefficient ⑀ 280 ϭ 99,570 (M Ϫ1 cm Ϫ1 ). Enzyme purity was assessed by SDS-PAGE followed by Coomassie Brilliant Blue staining. The gellan lyase product (⌬GlcA-Glc-Rha-Glc) for the UGL substrate was prepared as described elsewhere (10). Hyaloronate lyase products (⌬GlcA-GlcNAc) and chondroitin lyase products (⌬GlcA-GalNAc) were obtained from Seikagaku Corporation (Tokyo, Japan). k cat and K m parameters were determined by nonlinear fitting to the Michaelis-Menten equation.

RESULTS AND DISCUSSION
Crystallization and Structure Determination-UGL of Bacillus sp. GL1 is a monomeric enzyme with a molecular mass of about 43 kDa (377 amino acid residues) (9). A UGL crystal (0.1 ϫ 0.1 ϫ 0.5 mm) was obtained by sitting-drop vapor diffusion as described elsewhere (21). The space group was determined to be P6 5 22 with unit cell dimensions of a ϭ b ϭ 103.01 and c ϭ 223.04 Å, and the solvent content was 69% assuming one molecule per asymmetric unit. Results of native data collection using synchrotron radiation at the BL-38B1 station of SPring-8 are summarized in Table I. The phase of the structure was solved by mir. Table II shows phasing statistics at a resolution of 3.5 Å. The protein model was built after solvent flattening with the PHASES program (23) and refined by sim-ulated annealing and the restrained least-squares method using CNS (25) ( Table I).
Quality of Refined Model-The refined model of UGL consists of 377 amino acid residues, and 478 water molecules, four glycine molecules, two DTT molecules, and one MPD molecule. The entire polypeptide chain sequence was well traced, and the electron densities of the main chain and side chain were generally very well defined in the 2 F o Ϫ F c map, except for N-terminal amino acid residue Met 1 and C-terminal amino acid residue Arg 377 , whose electron density was too low for them to be identified completely. Other ligand molecules were also well fitted. The final overall R-factor for the refined model was 16.8%, with 63,316 unique reflections at a resolution of 25.0 -1.8 Å. The final free R-factor was 18.9%. Final root mean square (r.m.s.) deviations from standard geometry were 0.005 Å for bond lengths and 1.22°for bond angles. Based on theoretical curves in the plot calculated according to Luzzati (33), the absolute positional error was estimated to be close to 0.17 Å at a resolution of 5.0 -1.8 Å. Most (88.3%) nonglycine residues lie within most favored regions, and other residues (11.3%) within additionally allowed regions of the Ramachandran plot as defined in PROCHECK (26). Ser 345 ( ϭ 38°, ϭ 65°), however, falls in generously allowed regions, exhibiting well FIG. 2. Overall structure of UGL bound with molecules. a and b, structure is prepared by ribbon stereodiagrams using MOLSCRIPT (30) and RASTER3D (31). Colors denote secondary structure elements (pink, ␣-helices; cyan, ␤-strands; yellow, loops and coils), glycine (blue), DTT (red), and MPD (green) molecules (a, front; b, side). c, structure is represented as a white molecular surface model. Aromatic residues are pink. Bound molecules are presented by bond models and colored yellow. The figure was drawn using the GRASP (32) program. defined density in the 2 F o Ϫ F c map and located in a sharp bend of the loop neighboring a helix. There are no residues in disallowed regions.
Overall UGL Structure-The overall structure of UGL is shown as ribbon models (Fig. 2, a and b) and a molecular surface model (Fig. 2c). The enzyme is ϳ45 ϫ 45 ϫ 40 Å and consists of an ␣ 6 /␣ 6 -barrel structure with a deep pocket that is likely to be the active site (described below).  123 and 148 -153) adjoining the long ␣-helices, and some loops. H4 has one additional residue, Leu 95 , whose oxygen atom has no hydrogen bond with the nitrogen atom of a paired residue. This ␣-helix (H4) was thus divided into two segments, H4a and H4b. The ␣-helix bends at this point, the angle being 30°between H4a and H4b. The ␣ 6 /␣ 6 -barrel structure is formed by six outer helices (H1, H3, H5, H7, H9, and H11) running in roughly the same direction and six inner helices (H2, H4, H6, H8, H10, and H12) oriented in the opposite direction. These helices are connected, in a nearest neighbor and an up and down pattern by short and long loops. Loops between ␣-helices, such as H1 and H2, are referred to as L-H1:H2. One side of the barrel has short loops (L-H2:H3, L-H4:H5, L-H6:H7, L-H8:H9, 2 residues; and L-H10:H11, 6 residues). The other side, including the pocket, consists of long loops (L-H1:H2, 23 residues; L-H3:H4, 9 residues; L-H5:H6, 33 residues; L-H7:H8, 36 residues; L-H9:H10, 24 residues; and L-H11:H12, 30 residues). These long loops are packed together and form the wall of the deep funnel-shaped pocket roughly 15-20 Å in diameter at the lip and about 15 Å deep (Fig. 2c). This pocket is widely surrounded by aromatic residues.
Structural Comparison-UGL consists of an ␣/␣-toroidal fold. This basic fold is common in glycosyl hydrolases, polysaccharide lyases, and terpenoid cylases/protein prenyltransferases in the SCOP data base (scop.mrc-lmb.cam.ac.uk/scop/ data/scop.b.b.bcj.html) (34). UGL has the ␣ 6 /␣ 6 -barrel found in the six-hairpin enzyme superfamily of the SCOP data base, which includes glucoamylases (35)(36)(37), cellulase catalytic domains (38 -40), N-acyl-D-glucosamine 2-epimerase (AGE) (41), the maltose phosphorylase central domain (42), and hypothetical protein Yter. Based on the structural similarity in the RCSB Protein Data Bank (28) observed with the DALI (29) program, three proteins, i.e. hypothetical protein Yter from B. subtilis, glucoamylase from Thermoanaerobacterium thermosaccharolyticum (36), and AGE from the porcine kidney (41), in the SCOP data base superfamily exhibited the highest degree of similarity. These proteins exhibited Z-scores of 32.7, 21.1, and 19.4. The r.m.s. distance was 2.80 Å for the superimpositioning of 323 C ␣ atoms of UGL on those of Yter, 3.50 Å for that of 287 C ␣ atoms of UGL on those of glucoamylase, and 3.30 Å for that of 278 C ␣ atoms of UGL on those of AGE, although they exhibit no amino acid sequence similarity and catalyze different types of reactions. UGL is an exo-hydrolase acting on unsaturated oligosaccharides produced by polysaccharide lyases, while Yter is a hypothetical protein of unknown function. The crystal structure of Yter determined by the Midwest Center for Structural Genomics has not, to the best of our knowledge, been published, although its coordinates are available in the RCSB Protein Data Bank. Glucoamylase is a polysaccharide exo-hydrolase found in some prokaryotic and many eukaryotic microorganisms. AGE has the ␣ 6 /␣ 6 -barrel structure we determined, and catalyzes the epimeric reaction for N-acetyl-D-glucosamine and N-acetyl-D-mannosamine (41). Fig. 4 shows the superimpositioning of UGL on Yter; their structures show the best fit overall. The 12 helices of the ␣ 6 /␣ 6 -barrel are very similar for UGL and Yter in position, direction, and angle, but significant differences exist between UGL and Yter in the loop structure. The L-H1:H2 loop of UGL is longer (23 residues) than that of Yter (13 residues). The L-H7:H8 loop of UGL protrudes from the wall to the outside, while that of Yter is directed into the pocket. L-H11:H12 of UGL makes the pocket open wide, while that of Yter makes the pocket closed.
atom of its amino group to the Asp 149 O ␦1 atom (2.9 Å). For DTT molecules, in addition to hydrophobic interaction, the O2 hydroxyl group of DTT 501 is hydrogen-bonded to the His 210 N ⑀2 atom (3.0 Å) ( Table III). The inherent substrates, i.e. products of lyases (Fig. 1), of UGL include carboxyl and hydroxyl groups, and pyranose rings of substrates appear to exhibit a stacked hydrophobic interaction with aromatic residues, as is often seen in complexes of polysaccharide lyases with substrates (20, 48 -51). Glycine has a carboxyl group, and DTT contains two hydroxyl groups and a hydrophobic moiety. These structural characteristics of glycine and DTT molecules resemble those of substrates. The distance between Gly 401 and DTT 502 is about 14 Å, which corresponds to the distance of a trisaccharide. Since UGL can act on unsaturated di-, tri-, and tetrasaccharides (Fig. 1), these molecules (Gly 401 , DTT 501 , and DTT 502 ) are thought to interact with UGL in the active site, and to be Gly 401 C Trp 42 C ⑀2 , C 2 , C 3 , C 2 , C 2 L-H1:H2 Trp 219 C 2 L-H7:H8  (Fig. 6), were observed. Other residues, whose side chains also face the solvent at the active site, are His 210 of Bacillus sp. GL1 (Arg of other homologs), His 339 (Ser), Glu 141 (Ser), and Asn 142 (Asp, His). These differences in amino acid residues may imply substrate specificity.
Structural Basis for Catalysis-UGL releases nonreducing terminal ⌬GlcA from unsaturated oligosaccharides by splitting the bond between the anomeric carbon of ⌬GlcA and glycosidic oxygen (C1-O; ␤-1,3, ␤-1,2, or ␤-1,4). The reaction mechanism underlying unsaturated glucuronyl hydrolysis and the anomeric configuration of the released ⌬GlcA have yet to be clarified. The pyranose ring of the released ⌬GlcA, however, readily opens accompanying the loss of the double bond in C4ϭC5, and the saccharide is nonenzymatically converted to ␣-keto acid (Fig. 1d) (7,9). This makes it difficult to analyze the anomeric configuration in the UGL reaction. To verify implications about the active site of UGL above, and to determine the anomeric configuration of the intermediate monosaccaride, we tried, but failed, to prepare enzyme crystals bound with sugars, substrates, or products by soaking or cocrystallization. The presence of ligand molecules (one glycine and two DTT) in the active pocket interfered with the binding of sugars. We therefore proposed a catalytic mechanism, based on the UGL structure, conserved amino acid residues, and knowledge of glycoside hydrolysis, detailed below. Asp 88 , Asp 149 , His 86 , His 87 , His 193 , and Arg 221 , the ionizable residues nearest to the glycine molecule corresponding to the ⌬GlcA binding site, are completely conserved in UGL and its homologs (Figs. 5 and 6), and may play an important role in glycosyl hydrolysis. The carboxyl group of Gly 401 is believed to be located on that of ⌬GlcA and has an ion-pair interaction with the side chain of Arg 221 , as described above. Arg 221 is therefore a residue important for ⌬GlcA recognition. Two basic mechanisms, inversion and retention, have been proposed for glycoside hydrolases and are classified based on net retention or inversion of the anomeric configuration of the reaction product (52). Both reaction mechanisms involve two catalytic ionizable groups of carboxyl and carboxylate groups. Asp 88 and Asp 149 are thus the most likely catalytic residues for UGL. One is thought to act as the carboxylate group and nucleophile/base for anomeric carbon and the other as the carboxyl group and proton donor/acid for glycosidic oxygen in the first step in both reactions. Asp 88 and Asp 149 are 7.3 Å apart, a distance suitable for glycoside hydrolases that catalyze through inversion (average 10.0 Å) (53). This inversion is generally accepted for almost all glycoside hydrolases with an ␣ 6 /␣ 6 -barrel, such as glucoamylases (35)(36)(37), endoglucanase CelD (38), CelA (39), and endo/exocellulase E4 (40). In addition to glycoside hydrolases, another ␣ 6 /␣ 6barrel enzyme, i.e. Lactobacillus maltose phosphorylase, is also an inversion enzyme (42).
To investigate the two candidates, Asp 88 and Asp 149 , for catalytic residues, two mutants (D88N and D149N) having a substitution of Asn for Asp were prepared and assayed (Table  IV). CD spectra for wild-type and mutant enzymes showed almost the same profiles (data not shown), indicating that both mutants have no significant conformational change compared with the wild-type enzyme. Specific activity (k cat /K m ) of D88N (2.9 ϫ 10 Ϫ6 M Ϫ1 s Ϫ1 ) and D149N (9.8 ϫ 10 Ϫ5 M Ϫ1 s Ϫ1 ) was, however, significantly lower than that of the wild-type enzyme (8.1 ϫ 10 Ϫ2 M Ϫ1 s Ϫ1 ) for gellan tetrasaccharide (⌬GlcA-Glc-Rha-Glc). Both mutants are almost inactive. The k cat values of D88N (0.00057 s Ϫ1 ) and D149N (0.0059 s Ϫ1 ) were reduced by ϳ1000 -10,000-fold compared with that of the wild-type enzyme (7.3 s Ϫ1 ), while few fluctuations were observed in K m values among the three enzymes (wild-type 90 M; D88N, 200 M; D149N, 60 M). These kinetics for D88N and D149N suggest that Asp 88 and Asp 149 are essential for catalysis and support our hypothesis postulated by x-ray crystallography that one acts as the nucleophile/base and the other as the proton donor/acid. Intriguingly, the active site of UGL resembles that of hypothetical protein Yter (Fig. 7) more than that of any other structurally similar six-hairpin enzyme. The amino acid sequence identity is very low (less than 10%) between UGL and Yter, however, and long loops are arranged differently, as discussed above (Fig. 4). Asp 88 in UGL corresponds to Asp 88 in Yter, and in the same way, Asp 149 to Asp 143 , His 193 to His 189 , Arg 221 to Arg 213 , Trp 134 to Trp 141 , Trp 219 to Trp 211 , and Trp 225 to Trp 217 . Trp 42 in UGL is replaced by Tyr 41 in Yter. The side chains of residues in Yter that correspond to those of His 86 and His 87 in UGL are in opposite directions. Instead of these residues, the space of His 86 and His 87 in UGL is occupied by His 132 and Lys 133 in Yter. The space corresponding to Phe 91 in UGL is also occupied by Met 147 in Yter. The space corresponding to the side chains of the four residues, His 210 , Gln 211 , Tyr 338 , and His 339 , is occupied by the five sequential 331-335 residues of the long loop, L-H11:H12, in Yter. Because of this protruding loop, the putative active site of Yter is smaller than that of UGL. This high structural similarity of putative active sites suggests that Yter binds to a similar UGL substrate (or ligand) or exhibits similar activity.
Protein Data Bank Accession Number-The coordinates of