Structural Analysis of a Family 101 Glycoside Hydrolase in Complex with Carbohydrates Reveals Insights into Its Mechanism*

Background: The endo-α-d-N-acetylgalactosaminidase SpGH101 from Streptococcus pneumoniae hydrolyzes the O-linked T-antigen from proteins. Results: SpGH101 displays an unusual conformational change on substrate binding and a distinctive arrangement of its catalytic machinery. Conclusion: Substrate hydrolysis proceeds through a retaining mechanism with a proton shuttle. Significance: This is the first evidence of proton shuttle in a retaining glycoside hydrolase. O-Linked glycosylation is one of the most abundant post-translational modifications of proteins. Within the secretory pathway of higher eukaryotes, the core of these glycans is frequently an N-acetylgalactosamine residue that is α-linked to serine or threonine residues. Glycoside hydrolases in family 101 are presently the only known enzymes to be able to hydrolyze this glycosidic linkage. Here we determine the high-resolution structures of the catalytic domain comprising a fragment of GH101 from Streptococcus pneumoniae TIGR4, SpGH101, in the absence of carbohydrate, and in complex with reaction products, inhibitor, and substrate analogues. Upon substrate binding, a tryptophan lid (residues 724-WNW-726) closes on the substrate. The closing of this lid fully engages the substrate in the active site with Asp-764 positioned directly beneath C1 of the sugar residue bound within the −1 subsite, consistent with its proposed role as the catalytic nucleophile. In all of the bound forms of the enzyme, however, the proposed catalytic acid/base residue was found to be too distant from the glycosidic oxygen (>4.3 Å) to serve directly as a general catalytic acid/base residue and thereby facilitate cleavage of the glycosidic bond. These same complexes, however, revealed a structurally conserved water molecule positioned between the catalytic acid/base and the glycosidic oxygen. On the basis of these structural observations we propose a new variation of the retaining glycoside hydrolase mechanism wherein the intervening water molecule enables a Grotthuss proton shuttle between Glu-796 and the glycosidic oxygen, permitting this residue to serve as the general acid/base catalytic residue.

EngBF was the founding member of glycoside hydrolase family 101 (3) and uses a retaining catalytic mechanism, involving the transient formation of a glycosyl enzyme intermediate. Structural studies of EngBF and SpGH101 from S. pneumoniae strain R6, here referred to as SpGH101R6, revealed the similarity of the (␤/␣) 8 -barrel catalytic module to ␣-amylases in GH family 13 (3,8). The identity of the catalytic residues were proposed for the GH101 enzymes based on similar, but not exact, spatial positioning of the GH13 catalytic residues with an aspartate and a glutamate in GH101. Mutagenesis and chemical rescue studies provide strong evidence in favor of residues Asp-764 and Glu-796 in SpGH101R6 acting as the catalytic nucleophile and catalytic general acid/base residues, respectively (9).
The almost strict occurrence of GH101 enzymes in hostadapted bacteria suggests they play a role in either commensalism or pathogenesis. Indeed, for example, deletion of the gene encoding SpGH101 reduced the ability of S. pneumoniae strain 1121 to colonize the upper airways in a mouse model of nasopharyngeal colonization (10). These observations make the molecular bases by which SpGH101 binds and processes its substrates of interest. Such information could prove useful in exploiting these enzymes as tools for biotechnology as well as to design inhibitors that could have value as antimicrobials. Presently, however, experimental studies defining the molecular interactions that enable substrate recognition and turnover are lacking because of the absence of GH101 structures in complex with ligands.
To gain a better understanding of these molecular details, we focused on structural studies of SpGH101 from the S. pneumoniae TIGR4 strain, which has 99% amino acid sequence identity with SpGH101R6, in complex with a series of carbohydrate ligands including substrates, products, and inhibitor. The results collectively suggest an unusual reaction mechanism involving conformational change during substrate binding coupled with catalysis involving the general acid/base catalytic residue functioning from a distance by way of a short Grotthuss proton shuttle mediated by a single intervening water molecule.

Experimental Procedures
Cloning, Protein Production, Purification, Crystallization, and Structure Determination-A truncated form of SpGH101 comprising residues 317-1425 of the full-length protein (S. pneumoniae TIGR4 strain) and with an N-terminal His 6 tag, referred to as SpGH101(N) was cloned, expressed, purified, and crystallized as previously described (2). Selenomethionine-labeled SpGH101(N) was produced using Escherichia coli B834 (DE3) as the expression strain (Novagen). The defined media containing selenomethionine was prepared according to the instructions of the manufacturer (Athena Enzyme Systems). Cells were harvested by centrifugation at 6000 ϫ g for 10 min, chemically lysed, the supernatant was cleared by centrifugation at 27,000 ϫ g for 45 min, and the polypeptides were purified from the cellfree extract using immobilized metal affinity chromatography following the methods for the unlabeled protein (2). The purity of fractions was assessed using SDS-PAGE and those deemed to be greater than 95% pure were pooled, concentrated, and buffer exchanged into 20 mM Tris-HCl, pH 8.0, in a stirred ultrafiltration unit (Amicon) using a 10-kDa molecular mass cut-off membrane (Filtron). Selenomethionine-substituted SpGH101(N) protein was further purified by size-exclusion chromatography using Sephacryl S-200 (GE Biosciences) in 20 mM Tris-HCl, pH 8.0. Prior to crystallization, the selenomethionine-substituted SpGH101(N) was concentrated to 15 mg/ml in 20 mM Tris-HCl, pH 8.0.
Selenomethionine-substituted SpGH101(N) crystals were grown by the hanging drop vapor diffusion method in conditions identical to the unlabeled protein: 2-l drops with 1:1 ratio of protein to 25% (w/v) polyethylene glycol (PEG) 1500 (Hampton Research) at 292 K. Crystals of native and selenomethionine-substituted SpGH101(N) were cryoprotected in 1 l of 33% (w/v) PEG 1500 supplemented with 6% (v/v) MPD (Hampton Research), and flash-cooled directly in a nitrogengas stream at 113 K. Diffraction data for native crystals were collected on beamline 9-2 of the Stanford Synchrotron Light Source and data processed with XDS and scaled with SCALA (11,12). A single-wavelength anomalous dispersion dataset optimized for selenium was collected for crystals of the selenomethionine-substituted SpGH101(N) at the Canadian Light Source on beamline 08ID-1 and the data processed with MOSFLM and scaled with SCALA (13). Using this data, the heavy atom substructure determination, phasing, and density modification was performed with AutoSHARP (14). Twenty-twoseleniumpositionspresentinthesingleSpGH101(N)monomer in the asymmetric unit were used for phasing with the full 2.45-Å resolution dataset (acentric/centric figures of merit 0.35/0.14; phasing power, 0.93). The phases resulting from density improvement were of sufficient quality for BUCCANEER (15) to build a virtually complete model. This initial model was used as a search template for molecular replacement using PHASER and the higher resolution native data set (16). The native structure was iteratively improved with cycles of manual building with COOT and refinement with REFMAC (17,18).
Crystals of recombinant SpGH101(N) were suitable to determine an initial structure of the protein; however, crystals were difficult to reproduce consistently and were not sufficiently robust to obtain structures in complex with ligands. Therefore, an alternate construct of SpGH101, called SpGH101(C), was cloned into pET22. This gene fusion encoded a methionine immediately preceding the gene fragment encoding amino acids 317-1425, which was followed by a sequence that added a C-terminal His 6 tag and stop codon. This pET22 construct was used as a template to generate D764N and E796Q mutants by standard PCR site-directed mutagenesis procedures (see Table  1 for primer sequences). The DNA sequence fidelity of all constructs was verified using bidirectional sequencing.
Recombinant SpGH101(C) and mutants were produced in E. coli as for SpGH101(N) and purified to homogeneity by nickel-affinity chromatography and anion exchange chromatography. SpGH101(C), SpGH101(C)D764N, and SpGH101(C)E796Q in 20 mM Tris-HCl, pH 8.0, were crystallized by mixing equal volumes of 15 mg/ml of protein with a solution consisting of 12% (w/v) PEG 3350 and 0.15 M ammo-nium citrate, pH 7.0, using the sitting-drop vapor-diffusion method at 292 K. Plate shaped crystals developed after ϳ3 days and were utilized for subsequent microseeding in future hanging-drop setups.
In all cases, waters were added using the FINDWATERS option in COOT and manually inspected prior to the final refinement. Refinement procedures were monitored by flagging 5% of all observation as "free" (19). Model validation was performed with MOLPROBITY (20,21). All data collection and model statistics are shown in Table 2.
General Synthetic Procedures-PUGT was synthesized according to the steps shown in Scheme 1. All solvents were dried prior to use. Synthetic reactions were monitored by TLC using Merck Kieselgel 60 F 254 aluminum-backed sheets. Compounds were detected by charring with a 10% concentrated sulfuric acid in ethanol solution and heating. Flash chromatography under a positive pressure was performed with Merck Kieselgel 60 (230 -400 mesh) using specified eluants. 1 H and 13 C NMR spectra were recorded on 600 MHz (150 MHz for 13 C) (chemical shifts quoted relative to CDCl 3 or CD 3 OD where appropriate).
Benzyl(2-acetamido-4,6-benzylidene-2-deoxy-␣-D-galactopyranoside (1)-2-Acetamido-2-deoxy-D-galactose (22) (8.0 g, 36.2 mmol) was dissolved in BnOH (100 ml) at 60°C and acetyl chloride (2.0 ml) was added. The resulting reaction mixture was stirred overnight at 60°C and then was cooled to room temperature. To the mixture was added excess Et 2 O, and the resulting white solid precipitate was filtered and washed with Et 2 O, then dried under vacuum to yield benzyl 2-acetamido-2-deoxy-␣-Dgalactopyranoside as a white solid (9.0 g, 80%). This intermediate glycoside was used directly without further purification (9.0 g, 28.9 mmol) and suspended in benzaldehyde (30 ml) with ZnCl 2 (4.5 g). The reaction mixture was stirred at room temperature overnight to yield a clear solution to which excess cold water and hexanes was added. The resulting yellowish precipitate was collected by suction filtration, washed thoroughly with water and hexanes and then dissolved in dichloromethane (300 ml). This organic solution was washed with brine and then dried (MgSO 4 ). After filtration and removal of the solvent the residue was dried under vacuum to yield the desired product 1 as a solid foam (22) (8.7 g, 75%). 1  1 mmol) and Hg(CN) 2 (6.50 g, 25.6 mmol) were suspended in a solution of benzene/MeNO 2 (1:1, 240 ml). The mixture was distilled until 120 ml of solvent was removed. The temperature was then adjusted to 40 -45°C after which donor 2,3,4,6-tetra-O-acetyl-␣-D-galactopyranosyl bromide 2 (24) (10.0 g, 24.3 mmol) in a solution of benzene/ MeNO 2 (1:1, 48 ml) was added. The resulting mixture was stirred at 40-45°C overnight, then cooled to room temperature, diluted with benzene (300 ml), and filtered. The organic layer was washed with saturated NaHCO 3 , 10% KI, and brine, after which it was dried (MgSO 4 ). After filtration and removal of the solvent, the residue was dried under vacuum to yield product 3 as clear syrup (23) that was used in the next step without further purification.

TABLE 2
Data collection and structure refinement statistics  evaporated with toluene (4 ϫ 30 ml). The resulting residue was then dried under vacuum to yield product 4 as a solid foam (23) that was used in the next step reaction without further purification.

Results
SpGH101 Structure-The apo form of a truncated version of GH101 from S. pneumoniae R6 GH101 (SpGH101R6) was previously determined to 2.9-Å resolution; the resulting structure included seven of its 8 protein domains (PDB code 3ECQ; Fig. 1,  A and B) (8). In an effort to generate a construct of SpGH101 that would enable determination of structures at higher resolutions, and in complex with ligands, we created a truncation of SpGH101 from S. pneumoniae TIGR4 (2). Overall, the GH101 enzymes from each S. pneumoniae strain have 99% amino acid sequence identity. Our construct of the protein from the TIGR4 strain included five of the central domains (Fig. 1A). The construct with an N-terminal His 6 tag, SpGH101(N), crystallized in space group P2 1 with a single molecule in the asymmetric unit and scattered to 1.85-Å resolution (2). Selenomethioninelabeled SpGH101(N) allowed a preliminary structure to be determined to 2.35-Å resolution by the single-wavelength anomalous dispersion method (these experiments were performed prior to the release of the SpGH101R6 coordinates). An initial model comprising the single monomer in the asymmetric unit was generated by autobuilding with minimal manual intervention. This preliminary model was used to solve the structure of the unlabeled protein at the higher 1.85-Å resolution. The final model comprised 1104 residues, including residues 317-1418 of SpGH101. With the exception of the 3 amino acids immediately preceding the SpGH101 sequence, the N-terminal His 6 tag and thrombin cleavage sequence could not be modeled. The structure of SpGH101(N) reveals the distorted (␤/␣) 8 -barrel core catalytic domain (domain 3; residues ϳ600 -890) flanked by four additional domains that are mainly of ␤-sheet character (domains 2, 4, 5, and 6; Fig. 1,  A and B). The structure of SpGH101(N) is, as expected, highly similar to SpGH101R6, aligning with a root mean square deviation of 0.53 Å over 1045 matched C␣ positions. A detailed description of the domain architecture of pneumococcal GH101 was previously given by Caines et al. (8) for SpGH101R6.
There are four cation coordination centers distributed throughout the structure of SpGH101(N) (Fig. 1B). Based on refined atom occupancies, temperature factors, and coordination geometry, three of these positions were modeled as Ca 2ϩ ions and one as a Mn 2ϩ ion. One Ca 2ϩ (Ca1) is coordinated in the turn that precedes the C-terminal ␣-helix of domain 2. Additional Ca 2ϩ ions, Ca2 and Ca3, are coordinated between two strands of domain 5 and two coil regions of domain six, respectively (Fig. 1B). The Mn 2ϩ ion (Mn1) is coor-dinated between the catalytic domain (domain 3) and domain 6 ( Fig. 1B). In the structure of SpGH101R6 an equivalent ion to Ca1 was not modeled, whereas the position occupied by Mn1 was modeled as a Na ϩ ion. SpGH101(N) aligns with EngBF from Bifidobacterium longum (determined to 2.0-Å resolution) (3) with a root mean square deviation of 1.00 Å over 929 matched C␣ positions. The structure of EngBF contains 4 modeled Mn 2ϩ ions that overlap with the 4 ions modeled in SpGH101(N). Given the presence of these ions at domain interfaces it is likely they play a structural role. Indeed, treatment of EngBF with EDTA has a significant destabilizing effect on the enzyme (3).
At the center of the (␤/␣) 8 -barrel core, the catalytic domain is a distinctive pocket that houses the catalytic machinery: Asp-764 as the catalytic nucleophile and Glu-796 as the catalytic general acid/base (Fig. 1C). The proposed catalytic residues are also conserved in EngBF, which possesses the same substrate specificity as the GH101 enzyme from S. pneumoniae. Notable are the presence of two tryptophan residues, Trp-724 and Trp-726, which form an open lid over the active site. These residues are also conserved in EngBF (Fig. 1C) and in this enzyme, on the basis of their position relative to the active site and low activity of site-directed mutants, have been proposed to play a role in substrate recognition (3).
Although the structural analyses of GH101 enzymes performed to date have been important in establishing the general identity of the active site, the molecular details regarding how these enzymes recognize substrates remain unknown because no structures have been determined in complex with any carbohydrate ligands. Thus, we pursued structures of GH101 from S. pneumoniae TIGR4 in complex with an array of carbohydrate ligands. However, the crystal form used to obtain the structure of SpGH101(N) proved to be difficult to reproduce consistently and was not sufficiently robust to withstand physical manipulations (e.g. soaking in solutions with substrates or ligands). By generating a construct encoding the same fragment of GH101 but with a C-terminal His 6 tag, referred to as SpGH101(C), we were able to reproducibly obtain robust crystals in space group P22 1 2 1 with superior diffraction qualities and having a single molecule in the asymmetric unit. Using this construct and versions wherein the proposed residues acting as catalytic nucleophile (Asp-764) and acid/base (Glu-796) were conservatively mutated to generate SpGH101(C)-D764N and SpGH101(C)E796Q, we were able to obtain structures of GH101 from S. pneumoniae TIGR4 in complex with various carbohydrate ligands including substrates, products, and inhibitor.
The Structures of SpGH101(C) Mutants in Complex with T-antigen-In SpGH101R6, E796Q and D764N substitutions reduce the activity of the enzyme by 30-and 700-fold, respectively (9). We therefore incorporated these mutations into SpGH101(C) in an effort to obtain non-hydrolyzed substrate complexes. Crystals of SpGH101(C)E796Q were soaked with PNP-T-antigen or with a glycopeptide bearing a T-antigen and data sets to 1.75-and 2.5-Å resolution, respectively, were collected. In both cases, clear electron density was found for the T-antigen disaccharide but no electron density was observed for either the nitrophenol group or the peptide (Fig. 2, A and B). These observations suggested that, despite the introduction of the mutation, over the time during which the crystals were soaked in solutions of the substrates, they had been hydrolyzed in the active site to generate the observed product complexes. Unlike the disaccharide in the crystal soaked with glycopeptide, which could only be modeled as the ␣-anomer, the disaccharide in the PNP-T-antigen soaked crystal was modeled as a 1:1 mixture of ␣and ␤-anomers ( Fig. 2A). Otherwise, within the posi- tional error of these structures (maximum likelihood estimated standard uncertainties of Ͻ0.2 Å) the placement of the monsaccharide units was indistinguishable as was the arrangement of the protein side chains within the active site. Binding of this disaccharide product, however, resulted in a change in the conformation of the active site. Relative to the position of the loop containing Trp-724 and Trp-726 in the 1.85-Å ligand-free structure we obtained, this loop in the product complexes was pulled toward the active site while the tryptophan side chains themselves closed over the top of the disaccharide through a movement of ϳ3-5 Å and a rotation of ϳ50 o (Fig. 2B). The resulting active site is a partially closed pocket that very closely complements the shape of the disaccharide yet also accommodates several water molecules (Fig. 2C). This pocket has two subsites, Ϫ1 and Ϫ2, which accommodate the GalNAc and Gal residues of the T-antigen, respectively.
Engagement of the T-antigen by the active site results in an array of hydrogen bonds between the sugar and amino acid side chains in the enzyme, providing a distinct pattern of recognition for the T-antigen (Fig. 2D). In particular are direct and water-mediated hydrogen bonds with the axial C4 hydroxyl groups that likely provide specificity for the two galacto-configured monosaccharides. Although the acetamido group of the GalNAc residue makes only a single water-mediated hydrogen bond, it fits into a well tailored pocket in the active where van der Waals interactions likely provide selectivity for this chemical group.
In both of the complexes, Asp-764 is located ϳ3.5 Å immediately below the anomeric carbon and is therefore well positioned to serve as a catalytic nucleophile. Notably, however, O1 in the ␣-anomer, which would approximate the position of the glycosidic oxygen in the substrate, is over 4.3 Å away from the proposed general acid/base catalytic residue Glu-796 (Fig. 2, A  and B), too far to enable this residue to serve directly as a proton donor. In the two product complexes, however, a well ordered water molecule, which is also present in the ligand-free structure of the non-mutated enzyme, occupies this gap, forming a hydrogen bonding chain between the proposed general acid/ base residue and O1.
The Structure of SpGH101(C) in Complex with Substrate Analogues-Although SpGH101(C)E796Q appeared to retain enough activity to hydrolyze substrates during the crystal soaking procedure, by using the SpGH101(C)D764N nucleophile mutant we were able to obtain a structure of the intact serinyl T-antigen bound within the active site to 1.8-Å resolution. The electron density of the sugar was clear and continuous with the density for the O-linked serine residue (Fig. 2E). The protein displayed the same structural change as seen for the product complexes, with an identical pattern of hydrogen bonding to the sugar portion (not shown). Again, the glycosidic oxygen linking the serine and GalNAc is 4.9 Å away from the proposed catalytic general acid/base residue, Glu-796. Furthermore, the serine residue is positioned such that its carboxylate oxygen interacts with O⑀1 of Glu-796, effectively taking the position of the conserved water residue observed in the product complexes. As with the product complexes, Asn-764 is positioned 3.9 Å below C1 to perform its proposed role as nucleophile when present as an aspartate in the wild-type enzyme.
To obtain an alternative substrate complex but with wildtype SpGH101(C) we soaked crystals of SpGH101(C) with T-antigen methyl glycoside, reasoning that the methoxy group would be a poor leaving group because it would lack any aglycon binding interactions and may therefore be a poor substrate for the enzyme. The electron density in the area of the active site indicated two bound forms of the enzyme, with each form binding uncleaved T-antigen methyl glycoside (Fig. 3A). The minor species, modeled with 30% occupancy, matched the closed conformation of the enzyme (Fig. 3B). The T-antigen methyl glycoside in this bound form was not well occupied and could not be modeled with complete confidence; however, the sugar refined in this conformation appeared to be intact and was observed to reside in the active site in a position very close to that observed for the product complexes and the serinyl T-antigen complex. Like the T-antigen product complexes, electron density for a well ordered water molecule was modeled as bridging Glu-796 and O1 of the ligand (Fig. 3, A and B).
The major bound species, modeled with 70% occupancy, matched the open form of the enzyme active site with Trp-724 and Trp-726 rotated away from the active site (Fig. 3C). Remarkably, clear electron density was observed for the intact T-antigen methyl glycoside bound to the underside of the aromatic lid formed by the side chains of the two tryptophans (Fig.   FIGURE 3. Complexes of unmutated SpGH101(C) with T-antigen analogues. A, the structure of SpGH101(C) in complex with the T-antigen methyl glycoside. The gray mesh shows the maximum likelihood/ a -weighted F o Ϫ F c electron density map (contoured at 3). The map was generated by refinement of the structure with Trp-724, Trp-726, and Arg-1256 modeled as alanines and the carbohydrate ligand and putative catalytic water absent. The sticks show the final modeled structures, which represents two major conformations for the carbohydrate and Trp-724, Trp-726, and Arg-1256 residues (see text for details). B, the minor "B" conformation of the T-antigen methyl glycoside complex (purple and green with a water as a red sphere) overlaid with the T-antigen glycopeptide product complex (transparent gray and yellow with a water as a blue sphere). Dashed lines show putative hydrogen bonds; arrows show relevant distances. Distances in Å are given. C, the major "A" conformation of the T-antigen methyl glycoside complex (purple and green) overlaid with the T-antigen glycopeptide product complex (transparent gray and yellow with a water as a blue sphere). Potential hydrogen bonds for the T-antigen methyl glycoside complex are shown as black dashed lines, whereas those for the T-antigen glycopeptide product complex are shown as gray dashed lines. E, the structure of SpGH101(C) in complex with PUGT (purple and green) overlaid with the T-antigen glycopeptide product complex (transparent gray and yellow). The gray mesh shows the maximum likelihood/ a -weighted F o Ϫ F c electron density map (contoured at 3) for PUGT. Dashed lines show putative hydrogen bonds; arrows show relevant distances. Distances in Å are given. The curved arrow highlights the altered positions of the acetamido group in the two ligands. The inset shows the tryptophan residues that interact with the phenylcarbamate aglycon. 3A). Furthermore, the non-reducing end C3 and C4 hydroxyl groups of the modeled T-antigen methyl glycoside make three direct hydrogen bonds with active site residues (Fig. 3C). Notably, all of these residues are involved in coordinating the deeply bound disaccharide, although in the deeply bound saccharide, two of the direct hydrogen bonds are replaced by a single watermediated hydrogen bond (Fig. 3C). In addition, the C6 hydroxyl group of the Gal hydrogen bonds with N⑀ of Trp-724, an interaction that is maintained when the aromatic lid closes on the active site (Fig. 2D).
In an effort to better visualize the nature of potential aglycon interactions, we also synthesized a novel endo-␣-N-acetylglucosaminidase inhibitor (PUGT) comprising galactose ␤-1,

3-linked to an O-(2-N-acetyl-2-deoxy-D-galactopyranosylidene)
amino N-phenylcarbamate residue at the "reducing" terminus. The transition state structure stabilized by glycoside hydrolases is thought to be mimicked by the trigonal C1 of such compounds and, as such, these inhibitors typically bind tightly with the trigonal C1 in the Ϫ1 subsite and the oxime group spanning the position normally occupied by the scissile glycosidic bond (28). Although PUGT proved to be a surprisingly poor inhibitor (K i Ͼ10 mM; data not shown), we were able to obtain a complex of SpGH101(C) with this compound at 1.46-Å resolution. The electron density was clear for all but the most distal atoms of the phenyl ring of the compound (Fig. 3D). Again, the active site is found in its closed form. In contrast to all of the other complexes we obtained where the Gal and GalNAc rings were in the relaxed 4 C 1 conformation, the pyranose ring of PUGT in the Ϫ1 subsite refined in the 4 H 3 conformation, whereas the Gal residue in the Ϫ2 subsite was in the 4 C 1 conformation. Despite the planar conformation of the Ϫ1 residue at C2-C1-O5-C5, which pulls this portion of PUGT deeper into the Ϫ1 subsite relative to the product complexes, the carbohydrate portion of the inhibitor makes the same set of direct hydrogen bonds observed in all of the other T-antigen complexes (Fig. 3D). In this complex, however, the acetamido group is flipped up relative to its position in all of the other complexes (Fig. 3D). The significance of this position of the acetamido group is unclear, as it does not appear to be induced by a lack of space for the acetamido group. The planarity of C1 in PUGT positions the oxime O and N atoms at distances of 3.0 and 2.8 Å, respectively, from Glu-796 and with appropriate geometry to hydrogen bond. Asp-764 is positioned 3.1 Å immediately below C1. The plane of the PUGT phenyl ring is positioned at roughly right angles to the plane of the carbohydrate portion of the compound and thus extends out of the active site and into solution. Limited interactions are made between the phenylcarbamate group of PUGT and the protein. The indole ring of Trp-810 makes an unusual aromatic ring edge-edge interaction with the aryl group of PUGT (ϳ4 Å) distance and the edge of the Trp-797 ring interacts at right angles with the carbamate portion of the compound (Fig. 3D, inset). Despite these fortuitous interactions PUGT displayed poor inhibition of the enzyme, which cannot be readily reconciled with the observed structure.

Discussion
Substrate Recognition-Conformational changes in glycoside hydrolases upon substrate recognition are relatively uncommon and typically involve engagement of the active site machinery, such as movement of loops containing the catalytic residues in GH3, GH29, and GH115 enzymes (29 -31). Thus, the conformational change in SpGH101 involving the movement of a two-residue tryptophan "lid" over the active site, which is solely involved in substrate recognition rather than positioning of the catalytic machinery, is rare among glycoside hydrolases. In the absence of substrate, the enzyme is present in an "open" form, with a fully accessible active site (Fig. 4A). In this form, the planar guanidine group of Arg-1256 lies above the tryptophan lid, packing against the indole ring of Trp-726 likely forming a cationinteraction. In the fully engaged substrate "bound" form of the enzyme, the tryptophan lid is closed by an ϳ50 o rotation through an ϳ5 Å arc (Fig. 4B) from the position observed in the open form of the enzyme. In the bound form, Arg-1256 follows the closing of the lid by flipping down to maintain its interaction with Trp-726. In the bound form the active site is partially occluded by the tryptophan lid (Fig. 2C), suggesting the lid must be open to accept substrate and then only after substrate binding does the lid close on the bound substrate.
In addition to observing open unbound and closed bound forms of the enzyme, we also fortuitously obtained an unusual, partially occupied form using the T-antigen methyl glycoside as a substrate analogue. In this form, the planes made by carbons C4-C5-C6 of the GalNAc ␤-face and the same carbons of the ␤-face of the Gal residue pack against the platform made by tryptophans 724 and 726 in the open form of the enzyme. This binding appears to displace a water molecule coordinated by Glu-1253 and Asp-1254, whereas a set of hydrogen bonds involving the C3, C4, and C6 hydroxyl groups of the terminal Gal residue provide specific interactions with the protein (Fig.  4C). We acknowledge that this bound form of the enzyme may be a fortuitously obtained artifact of the procedure used to obtain the crystal structure of this complex. This interaction, however, has the hallmarks of a specific protein-carbohydrate interaction and thus we cannot completely discount its potential relevance. Indeed, from this potential intermediate state, closing of the tryptophan lid with a ϳ1.2 Å slide of the carbohydrate toward the catalytic machinery, would both maintain how the sugar packs against the tryptophan lid and result in full engagement of the active site. This rearrangement would also require reinsertion of the displaced water that interacts with terminal Gal residue to fill the gap formed by movement of the substrate toward the catalytic residues (Fig. 4C). Thus, if this open and bound species is one that is able to form in solution, it may represent an intermediate formed on the way to full engagement of the active site. Alternatively, it is also possible that this intermediate form is a non-productively bound form that does not lead to catalysis.
The conformational changes involved in substrate recognition concludes with a tight fit of the disaccharide into the closed active site. The highly complementary contouring of the active site to the disaccharide sterically prevents accommodation of any glycan structures larger than the T-antigen (Fig. 2C), such as a ␤-1,6-GlcNAc modification of the core GalNAc (to create a core 2 O-glycan), ␣-2,6-sialyation of the core GalNAc, or ␣-2,3sialylation of the terminal Gal. As noted by Suzuki et al. (3), structural variation of GH101 enzymes in the region where O6 of the core GalNAc binds likely enables the activity of other GH101 family members toward core 3 O-glycans. In SpGH101, whereas the Ϫ1 and Ϫ2 subsites impart specificity for the T-antigen, the enzyme lacks any definable features that accommodate the protein/peptide aglycon. A surface representation of the protein in the region of the active site shows a wide, shallow trough that is consistent with accommodation of a protein or peptide aglycon (Fig. 5).
Catalytic Mechanism-The evidence to date supports a catalytic mechanism for GH101 enzymes that retains the stereochemistry at the anomeric carbon and, in SpGH101, uses Asp-764 as the catalytic nucleophile and Glu-796 as the general acid/ base residue (Fig. 6A). An overlap of SpGH101(C) with the Thermoactinomyces vulgaris ␣-amylase (TVAI) in complex with its substrate, as previously noted, shows the structural conservation of the catalytic residues in TVAI (Asp-356 and Glu-396) with Asp-764 and Glu-796 in SpGH101(C) (32). Asp-764 and Glu-796 are also located in proximity to C1 and O1, respectively, of the carbohydrate, and thus the arrangement is largely consistent with the functional assignment of these residues (Fig. 6B). Indeed, in all of our five complexes of SpGH101(C) with the T-antigen and T-antigen analogues, the side chain of Asp-764 is positioned Ͻ3.9 Å below C1, poised for an in-line attack and, therefore, consistent with a role as the catalytic nucleophile (Figs. 2, A, B, and E, and 3, B and D). In contrast, however, a more detailed examination of our structures suggests that the role played by Glu-796 is slightly ambiguous. A catalytic general acid/base should be positioned less than ϳ3.3 Å away from the glycosidic oxygen, as it is in TVAI (Fig. 6B), to act as an effective proton donor to aid in departure of the leaving group. In all of our complexes the O1 of the GalNAc residue is positioned more than 4.3 Å away from the side chain of Glu-796 (Figs. 2, A, B, and E, and 3B), with no other residues suitably placed to act as a general acid/base. Although mutagenesis and chemical rescue data provide strong support for the identity of Glu-796 as the general acid/base residue, our structural data suggests that Glu-796 is unable to directly donate a proton to the glycosidic oxygen of the substrate, therefore leading to the question of how this residue functions in catalysis.
In both T-antigen product complexes and the T-antigen methyl glycoside complex a well ordered water molecule bridges O1 of the GalNAc and the Glu-796 side chain (Figs. 2, A and B, and 3B). Moreover, the placement of this water molecule is well conserved in these complexes as well as in the absence of substrate (Fig. 6C). The only contradiction to this observation is our structure of the serinyl T-antigen bound to SpGH101(C), which lacks the conserved water. In this structure, a carboxylate oxygen of the serine occupies the position where the conserved water would otherwise be (Fig. 2E). This conformation, however, we argue is an artifact promoted by the presence of only a single amino acid attached to the sugar moiety. Consistent with this interpretation, the conformation we observe is such that the carboxylate of the serine is buried within the active site, which would permit recognition of T-antigen motifs positioned only on the C-terminal residue of a protein or peptide. This scenario is incompatible with the ability of SpGH101 to remove the T-antigen from mucins (33,34) and our observation of its activity on a glycopeptide with a T-antigen modification positioned in the middle of the peptide. On this basis, we suggest that the conformation of the serinyl T-antigen in the crystal structure is a non-productively bound conformation in which the carboxylate oxygen takes the position of the catalytic bridging water molecule. From the position of the conserved water molecule that bridges Glu-796 and the O1 of the GalNAc in the bound disaccharide substrates we propose that Glu-796 acts indirectly as the general acid/base through a short Grotthuss chain (Fig. 6D). Catalytic mechanisms involving Grotthuss chains are seen in glycoside hydrolases such as the GH6 and GH124 ␤-1,4-endoglucanases (35,36). All of the enzymes previously proposed to utilize such a mechanism, however, are inverting enzymes where the catalytic base residue acts indirectly through a short water chain with the final deprotonated water in the chain acting as a nucleophile that attacks C1 of the sugar. In the proposed mechanism for SpGH101, an enzyme that retains the stereochemistry at the anomeric carbon (retaining mechanism), proton transfer from the acid/base Glu-796 to the glycosidic oxygen is coordinated through the intervening water molecule, thereby aiding departure of the protein leaving group. Two scenarios are plausible for the deglycosylation step. In one, the side chain of Glu-796 acts as a general base, which, through the intervening catalytic water, facilitates attack of an incoming water molecule on the anomeric center of the glycosyl-enzyme intermediate. Alternatively, the intervening catalytic water may shift to directly attack the anomeric center of the glycosyl-enzyme intermediate, ultimately to be replaced by a new water molecule following departure of the product. Given the distance the catalytic water would have to migrate (Ͼϳ2 Å), however, we find this alternative less likely.
Overall, substrate recognition by SpGH101 is accompanied by a structural change that comprises the closing of a tryptophan lid on the active site, pinning the substrate in position and causing partial occlusion of the entrance to the active site. Once SpGH101 engages the substrate, the glycosidic oxygen is too far from the acid/base residue to directly accept a proton. As a water molecule bridges this distance in SpGH101 we propose a mechanism whereby the water molecule acts as a proton shuttle to deliver a proton to the glycosidic oxygen during catalysis from the general acid/base residue. Why SpGH101 may have adopted this mechanistic variation of the classical retaining glycoside hydrolase mechanism is unclear. However, such a Grotthuss-like mechanism may enable SpGH101, and other GH101 enzymes, to better accept large protein and peptide substrates having highly variable structures including some that adopt non-optimal conformations. This reasoning implies that SpGH101 may represent a unique example of how a GH catalytic mechanism has adapted to accommodate a specific class of substrate.