A Second β-Hexosaminidase Encoded in the Streptococcus pneumoniae Genome Provides an Expanded Biochemical Ability to Degrade Host Glycans*

Background: The genome of Streptococcus pneumoniae encodes a second uncharacterized family 20 glycoside hydrolase. Results: GH20C displays activity on both terminal β-linked N-acetylglucosamine and N-acetylgalactosamine. Conclusion: GH20C is an enzyme able to cleave a wide variety of N-acetylhexosamine-terminating sugars. Significance: S. pneumoniae has the biochemical ability to act on a wide variety of sugars that it would encounter in the human body. An important facet of the interaction between the pathogen Streptococcus pneumoniae (pneumococcus) and its human host is the ability of this bacterium to process host glycans. To achieve cleavage of the glycosidic bonds in host glycans, S. pneumoniae deploys a wide array of glycoside hydrolases. Here, we identify and characterize a new family 20 glycoside hydrolase, GH20C, from S. pneumoniae. Recombinant GH20C possessed the ability to hydrolyze the β-linkages joining either N-acetylglucosamine or N-acetylgalactosamine to a wide variety of aglycon residues, thus revealing this enzyme to be a generalist N-acetylhexosaminidase in vitro. X-ray crystal structures were determined for GH20C in a ligand-free form, in complex with the N-acetylglucosamine and N-acetylgalactosamine products of catalysis and in complex with both gluco- and galacto-configured inhibitors O-(2-acetamido-2-deoxy-d-glucopyranosylidene)amino N-phenyl carbamate (PUGNAc), O-(2-acetamido-2-deoxy-d-galactopyranosylidene)amino N-phenyl carbamate (GalPUGNAc), N-acetyl-d-glucosamine-thiazoline (NGT), and N-acetyl-d-galactosamine-thiazoline (GalNGT) at resolutions from 1.84 to 2.7 Å. These structures showed N-acetylglucosamine and N-acetylgalactosamine to be recognized via identical sets of molecular interactions. Although the same sets of interaction were maintained with the gluco- and galacto-configured inhibitors, the inhibition constants suggested preferred recognition of the axial O4 when an aglycon moiety was present (Ki for PUGNAc > GalPUGNAc) but preferred recognition of an equatorial O4 when the aglycon was absent (Ki for GalNGT > NGT). Overall, this study reveals GH20C to be another tool that is unique in the arsenal of S. pneumoniae and that it may implement the effort of the bacterium to utilize and/or destroy the wide array of host glycans that it may encounter.

[N-acetylneuraminic acid ␣(2-3)-]-Gal␤(1-4)-Glc-ceramide) within the lysosome (10,11). Mutations in the genes encoding either of these enzymes can lead to lysosomal storage disorders with sometimes severe pathologies (12,13). GH20 enzymes are common in microbes where their known or predicted specificities suggest a wide variety of biological roles. A notable example whose discovery dates back over 50 years is StrH, which is an exo-␤-D-N-acetylglucosaminidase able to release terminal N-acetylglucosamine (GlcNAc) residues from complex carbohydrates. It was first purified from Streptococcus pneumoniae culture supernatants by Hughes and Jeanloz (14), later characterized in detail by Yamashita et al. (15), and the gene encoding StrH was ultimately identified by Clarke et al. (16). This enzyme contains two tandem catalytic GH20 modules; together, these modules and their structural features give StrH specificity for the mannose arms of N-linked glycans (i.e. the aglycon) making the enzyme very specific for terminal GlcNAc residues in complex N-linked glycans (9). In addition to its defining specificity, this cell wall-bound surface protein is implicated in virulence through a potential role in host complement modulation and nutrient acquisition (9).
The role of StrH in the host-S. pneumoniae interaction is an example of an expanding paradigm of how this bacterium attacks and utilizes host glycans to inhabit and invade human tissues (9,17). S. pneumoniae is a commensal bacterium inhabiting the human upper respiratory tract of healthy individuals, but in some circumstances it can progress beyond the nasopharynx to other normally sterile sites and cause diseases ranging from comparatively minor infections, like otitis media, to lethal infections such as pneumonia, bacteremia, and meningitis (18). The occurrence of S. pneumoniae infections remains high in both developed and developing countries with high-risk groups, including children, the elderly, and immuno-compromised individuals. Approximately 15 million cases of invasive S. pneumoniae infections are reported annually with ϳ1.6 million of these resulting in death (19). Given that the primary route of entry and/or colonization niche is the human nasopharynx, S. pneumoniae encounters an environment that is rich in the complex carbohydrates displayed by the host tissues. In keeping with this, large scale screening has suggested that a significant number of genes encoding putative and known glycoside hydrolases, including StrH, may participate in virulence (20 -22). These are increasingly being validated as important components of the host-bacterium interaction, such as nutritional support, enhanced adherence, and immune system modulation (9,17,(23)(24)(25)(26)(27)(28).
It was from the perspective of the importance of carbohydrate-processing enzymes to S. pneumoniae virulence and the role of StrH in this that we noted the presence in the S. pneumoniae genome of a second gene encoding a family 20 glycoside hydrolase. This gene appears to be present in all sequenced strains of the bacterium. The putative product of this gene, referred to as GH20C, displays ϳ70% amino acid sequence identity with the structurally characterized Streptococcus gordonii GcnA, which displays both N-acetylgalactosaminidase and N-acetylglucosaminidase activity on synthetic aryl glycosides (29,30). Although the relatively high amino acid sequence identity between GcnA and GH20C lends itself to inferring similar abilities to cleave N-acetylhexosaminides, GH20C remains uncharacterized. Likewise, the aglycon specificities of both GcnA and GH20C are unknown. Given the presence of an N-glycan-specific N-acetylglucosaminidase, StrH, in S. pneumoniae, we hypothesized that GH20C is present to expand the ability of the bacterium to process N-acetylhexosamine-containing host glycans, rather than having an overlapping specificity. Through structural and biochemical analyses of recombinant GH20C, we show that this enzyme is able to cleave terminal GalNAc and GlcNAc residues and that it has only limited selectivity for aglycon sugars. These results suggest that rather than being redundant with StrH, GH20C is likely responsible for the cleavage of terminal N-acetylhexosamine residues that StrH cannot hydrolyze and, in particular, provides N-acetyl-␤-D-galactosaminidase activity that does not appear to be present in any other known or predicted S. pneumoniae enzymes.

Experimental Procedures
Materials-Oligosaccharides were obtained from V-labs (Covington, LA). PUGNAc and NGT synthesized in-house were kind gifts from Dr. David Vocadlo. GalPUGNAc and GalNGT synthesized in-house were kind gifts from Dr. Keith Stubbs. All other reagents, including GlcNAc and GalNAc, were purchased from Sigma unless otherwise specified.
Protein Production and Purification-All constructs were transformed into the expression strain, Escherichia coli BL21 Star (DE3) strain. Six liters of yeast/tryptone broth, containing 50 g/ml kanamycin, was inoculated with the transformed cells and incubated at 37°C. Once an optical density of ϳ1 at 595 nm was reached, protein production was induced by the addition of isopropyl 1-thio-␤-D-galactopyranoside to a final concentration of 0.5 mM. Incubation of the cultures was continued overnight with shaking at 16°C. Cells were harvested by centrifugation at 5000 ϫ g for 10 min and ruptured by chemical lysis. The cleared supernatant of the cell lysate was loaded onto a Ni 2ϩnitrilotriacetic acid-immobilized metal affinity chromatography column. Polypeptide was eluted with binding buffer (20 mM Tris, pH 8.0) containing increasing concentrations of imidazole (0 -500 mM). GH20C was concentrated and buffer-exchanged into 20 mM Tris, pH 8.0, in a stirred ultrafiltration unit (Ami-con). Purified protein was further purified by size exclusion chromatography using a Sephacryl S-200 column (GE Healthcare) and concentrated using a stirred cell ultrafiltration device (Amicon) with a 10,000-Da molecular mass cutoff membrane (Millipore).
Determination of Protein Concentration-Protein concentration was determined by UV absorbance at 280 nm using the calculated extinction coefficient (31) Enzyme Kinetics and Inhibition-All steady state kinetic studies were performed at 37°C in a Cary/Varian 300 Bio UVvisible spectrophotometer. The pH optimum was determined using an end-point assay (stopped with 100 mM NaOH) in McI-Ivaine buffer at pH values ranging from 2.25 to 9.2; the optimum activity was found at 6.5 (data not shown). Determination of kinetic constants was done in triplicate 1-ml assays containing 50 mM citrate buffer, pH 6.5, 0.1% BSA, 2 nM GH20C, and 0 -0.9 mM 4-nitrophenyl-N-acetylglucosaminide (pNP-GlcNAc) or 0 -0.45 mM 4-nitrophenyl-N-acetylgalactosaminide (pNP-GalNAc). Nitrophenolate production was monitored at 400 nm; rates were calculated using an extinction coefficient, ⑀400 nm, of 6090 M Ϫ1 cm Ϫ1 . Michaelis-Menten parameters were determined using GraphPad by nonlinear curve fitting. The K i values for the inhibition by PUGNAc, GalPUGNAc, NGT, and GalNGT were determined by Dixon plots analysis using pNP-GlcNAc as a substrate.
Activity Screen-Qualitative analysis of GH20C activity on di-and trisaccharides was done by thin layer chromatography using a variety of oligosaccharides containing non-reducing terminal GlcNAc or GalNAc. Reactions were done in a 20-l final volume containing 50 mM Tris, pH 8, 30 nM GH20C, and 3 mM sugar. Reactions were incubated at 37°C for 1 h and then spotted on a silica plate. TLC was run for 2 h in n-propyl alcohol/H 2 O/ethanol (7:2:1) solvent, and oligosaccharides were revealed by using 5% sulfuric acid and a 30-min incubation at 110°C.
General Crystallography Procedures-Crystals were obtained at 18°C using sitting-drop vapor diffusion for screening and hanging drop vapor diffusion for optimization. For data collection, single crystals were flash-cooled with liquid nitrogen in crystallization solution previously dehydrated to increase cryoprotectant content for each crystal form as given below. Diffraction data were collected either on a "home-beam" comprising a Rigaku R-AXIS 4ϩϩ area detector coupled to an MM-002 x-ray generator with Osmic "blue" optics and an Oxford Cryostream 700, Beamline 9-2 of the Stanford Linear Accelerator Center (SLAC, Stanford Synchrotron Radiation Lightsource (SSRL), Stanford, CA), or 08ID-1 at the Canadian Light Source (Saskatoon, Saskatchewan) as indicated in Table 1. Most diffraction data were processed using MOSFLM and SCALA (32). However, the NGT and PUGNAc complexes were processed with HKL2000 and XDS, respectively. All data collection and processing statistics are shown in Table 1. For all structures, manual model building was performed with COOT (33), and refinement of atomic coordinates was performed with REFMAC (34). The addition of water molecules was performed in COOT with FINDWATERS and manually checked after refinement. In all data sets, refinement procedures were moni-tored by flagging 5% of all observations as "free" (35). Model validation was performed with MOLPROBITY (36).
GH20C Structural Determination-All crystals of GH20C (20 mg/ml) were grown in 12% (w/v) PEG 3350, 0.2 M magnesium chloride, 0.05 M CAPS, pH 10.2, and 1% glycerol. Crystals were soaked for 5 min to several hours in the crystallization solution containing an excess of reaction products and inhibitors. Data sets were collected as above on a single crystal that was cryoprotected by exposure of the crystallization drop to air to dehydrate it. The structure of GcnA (Protein Data Bank code 2EPN) was used to solve the native structure of GH20C by molecular replacement using PHASER. For all the complex structures, GH20C native structure was used as a search model for molecular replacement using PHASER.
Generation of ⌬gh20C Strain-A PCR ligation technique was used to replace gh20C with a chloramphenicol-resistant cassette (37). Regions flanking the gh20C gene and the chloramphenicol-resistant cassette were amplified by PCR with primers containing specific restriction sites. The flanking region 1 was amplified using the forward primer 5Ј-AATACTAAGCCTAC-ACAGACTGATTTCAA-3Ј and the reverse primer 5Ј-CATA-TGCATATGCAGCGTCGACGATTTATACG-3Ј containing the NdeI restriction site. The second flanking region was amplified using the forward primer 5Ј-CATATGCTCGAGCC-TCAATAGCTTGCGTTTGTT-3Ј containing the XhoI restriction site and the reverse primer 5Ј-TGGCTAGATCAACGCTTG-TCAGAACAGGA-3Ј. The chloramphenicol-resistance cassette was amplified with the CAM-Fw primer (5Ј-TTTGGACATATG-GATGAAAATTTGTTTGAT-3Ј) containing an NdeI restriction site and the CAM-Rv (5Ј-CTTTGTCTCGAGGTACAGTCC-GGCATTATC-3Ј) containing XhoI restriction site. Each amplicon was digested with the appropriate restriction enzyme and ligated together. This ligation reaction was subsequently used to transform S. pneumoniae TIGR4 creating the ⌬gh20C strain where the gh20C gene was replaced by the chloramphenicol-resistance gene. The S. pneumoniae transformation protocol was adapted from Bricker and Camilli (38). Bacteria were grown for about 9 h in Todd Hewitt broth to an approximate A 600 of 0.8. A 1:50 dilution was then made into the pre-induction growth medium of Todd Hewitt broth, 0.5% (w/v) glycine, 11 mM HCl and grown at 37°C in a candle jar until an A 600 of 0.03 was reached. The media were then supplemented with 10 mM NaOH, 0.2% (w/v) BSA, 1 mM CaCl 2 , and 100 ng/ml competence-stimulating peptide 2 (CSP-2; synthesized by GenScript) in their respective order and incubated in a candle jar at 37°C for 14 min. The entire ligation reaction was then added to the media and subsequently incubated for 1 h at 37°C in a candle jar. The transformation reaction was then diluted 1:4 with Todd Hewitt broth and incubated further for 2 h. The transformations were plated on sheep blood agar supplemented with chloramphenicol and incubated overnight in a candle jar at 37°C. Colonies were re-streaked onto new blood agar plates, and the genomic position of the chloramphenicol cassette was confirmed by several PCR analyses.
RT-PCR and Western Blotting-S. pneumoniae GH20C mRNA levels when exposed to different carbohydrates were determined by quantitative reverse transcription PCR. S. pneumoniae TIGR4 was grown in a casein hydrolysate-based  DECEMBER 25, 2015 • VOLUME 290 • NUMBER 52 medium with 0.3% sucrose and 0.25% yeast extract (AGCHY) (39,40) supplemented with 0.5% glucose at 37°C with 5% CO 2 to an A 600 of 0.4. Bacteria were pelleted and washed with AGCHY, then resuspended in AGCHY or AGCHY with 0.5% sugar, and incubated at 37°C with 5% CO 2 for 30 min. Bacteria were lysed using lysozyme and mutanolysin, and RNA was extracted using the Qiagen RNeasy Plus mini kit. Removal of genomic DNA contamination was carried out by DNase I treatment (Roche Applied Science) with subsequent quantitative RT-PCR performed using SuperScript III one-step RT-PCR kit (Invitrogen) on an LC480 real time cycler (Roche Applied Science). GH20C transcription levels were analyzed using primers CTGGTTGGCTGTAGTTGCTG and CGAGGTTTATCTG-GCTGGTC and data normalized to expression levels for 16S rRNA using primers CATGCAAGTAGAACGCTGAA and TGTCATGCAACATCCACTCT. Each condition was performed in biological triplicates, and the fold change relative to expression levels when the bacterium was not exposed to sugar was calculated. S. pneumoniae TIGR4 was streaked out on trypticase soy agar plates supplemented with defibrinated sheep blood and incubated overnight at 37°C in a candle jar. Culture was started by inoculating 5 ml of AGCHY media supplemented with 1% D-galactose and incubated for 6 h in a candle jar at 37°C. The S. pneumoniae TIGR4 bacterial pellet was resuspended in phosphate-buffered saline, pH 7.4 (PBS), with protease inhibitor (Roche Applied Science) and lysed by sonication to yield the lysate fraction. Supernatant from the S. pneumoniae TIGR4 culture was filter-sterilized to remove any residual pneumococcal cells. Trichloroacetic acid precipitation was performed to concentrate secreted proteins. For that, 1 part 100% trichloroacetic acid (w/v) was added to 4 parts supernatant, left on ice for ϳ30 min, and centrifuged for 5 min at 14,000 rpm. The precipitated pellet was washed twice with cold acetone, dried at room temperature, and resuspended in SDS-PAGE loading buffer yielding the extracellular fraction. Laemmli buffer was added to both extracellular and lysate samples, and samples were incubated at 100°C for 10 min. Samples were run on a 10% SDS-PAGE, transferred to a nitrocellulose membrane (Whatman), and analyzed by Western blot using a 1:50 dilution of rabbit anti-serum raised against purified recombinant GH20C from TIGR4 strain (antibodies were raised by ImmunoPrecise Antibodies Ltd., Victoria, British Columbia, Canada). Horseradish peroxidase-conjugated anti-rabbit (GE NA934) secondary antibody was used at 1:5000. Proteins were detected by chemiluminescence (Millipore WBLUF0500) and exposed to film.

gh20C Gene Is Transcribed and a Protein Product Produced-
Using quantitative PCR, we detected the presence of gh20C transcripts in response to exposure to different monosaccharides as the only source of carbon and determined the change in expression of the transcript relative to expression in the absence of a carbohydrate source (Fig. 1A). We observed a statistically significant reduction in the detection of transcript in the presence of glucose and mannose, and the largest reduction was in the presence of GlcNAc. No change was observed in the presence of galactose, fucose, or GalNAc. This result indicates under the conditions used that, in general, the gh20C gene is expressed and that it is expressed in the absence of added carbohydrate but is repressed in the presence of glucose, mannose, and GlcNAc. To confirm that the production of gh20C transcript also results in the production of a protein product, we performed Western blotting for GH20C using a polyclonal antibody raised against recombinant GH20C. The detection of a protein with similar electrophoretic migration to recombinant GH20C protein was detected by Western blotting in the lysate and supernatant fractions of wild-type S. pneumoniae TIGR4 cultures (Fig. 1B). Although antibody-reactive bands were also detected in the lysate and supernatant fractions of a ⌬gh20C mutant strain created from the S. pneumoniae TIGR4 parent, the band with the same migration properties as the recombinant GH20C was absent in both the lysate and supernatant fractions of this mutant (Fig. 1B). Thus, the gh20C gene is transcribed and translated in this bacterium with the regulation being, at least in part, consistent with catabolite repression. Our crude fractionation indicates that GH20C is found associated with the cells and in the culture supernatant, albeit to a lesser degree in the latter fraction. The detection of GH20C in the supernatant fraction is not consistent with the absence of known motifs that target the protein for secretion; however, the ability of S. pneumoniae to non-classically secrete proteins that do not possess a signal peptide or a known anchoring motif is well documented (41). A, fold changes of GH20C transcript were measured by quantitative RT-PCR after exposure of TIGR4 to different carbohydrates. The fold change relative to expression levels when the bacterium was not exposed to sugar was calculated, and standard deviations of three replicates are indicated on each bar. B, production of GH20C protein by S. pneumoniae was detected by Western blotting using a polyclonal antibody raised against recombinant GH20C. Wild-type and ⌬gh20C mutant of S. pneumoniae TIGR4 were grown for 6 h at 37°C in AGCH media and harvested by centrifugation. The cells were then lysed by sonication, and the culture supernatant and lysed cellular fractions were analyzed by Western blotting. Activity of GH20C-The complete open reading frame of the gh20C gene (1878 bp) was cloned in-frame with an N-terminal hexa-histidine tag. The product of this gene fusion, referred to as GH20C, was produced in E. coli and purified to apparent homogeneity, as judged by SDS-PAGE, by immobilized metal affinity chromatography and size exclusion chromatography. In qualitative analyses, recombinant GH20C displayed activity on both 4-nitrophenyl N-acetyl-␤-D-glucosaminide (pNP-GlcNAc) and 4-nitrophenyl N-acetyl-␤-D-galactosaminide (pNP-GalNAc). Using pNP-GlcNAc, the pH optimum of the enzyme was determined to be 6.5, and subsequent kinetic characterization at this pH revealed a K m of 560 (Ϯ60) M, a k cat of 3650 (Ϯ330) min Ϫ1 , and a k cat /K m of 6.5 (Ϯ0.6) min Ϫ1 ⅐M Ϫ1 . Kinetic analysis using pNP-GalNAc gave a K m of 262 (Ϯ14) M, a k cat of 1100 (Ϯ59) min Ϫ1 , and a k cat /K m of 4.5 (Ϯ0.2) min Ϫ1 ⅐M Ϫ1 . Thus, these two synthetic substrates displayed compensatory changes in K m and k cat to result in similar overall efficiency of cleavage, as represented by k cat /K m , for the two substrates.
Given the apparent lack of preference of GH20C for GlcNAc or GalNAc as the glycon, we probed potential aglycon selectivity using thin layer chromatography to qualitatively examine the activity of GH20C on a variety of biologically relevant GlcNAc-and GalNAc-terminating sugars. Consistent with its activity on pNP-GlcNAc and pNP-GalNAc, GH20C displayed the ability to cleave a variety of sugars with terminal ␤-linked GlcNAc and GalNAc ( Table 2). The tested substrates encompassed galactose (Gal), mannose (Man), and GalNAc as aglycon residues linked ␤1-2, ␤1-3, ␤1-4, or ␤1-6 to the glycon, indicating broad substrate cleavage capabilities. Notably, GH20C was able to cleave the trisaccharide ␤-D-GlcNAc-(136)-[␤-D-Gal-(133)]-␣-D-GalNAc, suggesting the ability to accommodate more complex aglycons. The only substrate that was not observed to be cleaved was ␤-D-GalNAc-(134)-␤-D-Gal. Because of the ␤-1,4-linkage and axial configuration of C4 in the Gal residue of this disaccharide, the sugar would adopt a conformation with a significant angle between the planes of the pyranose rings (i.e. a kinked or pleated conformation). This feature separates it from the other tested sugars where the composite pyranose rings would be roughly coplanar resulting in an extended substrate conformation.
Structure of GH20C-In an effort to provide molecular level insight into substrate recognition and turnover by GH20C, we pursued structural studies by x-ray crystallography. An unliganded x-ray crystal structure of the enzyme crystallized in the space group P2 1 2 1 2 1 was determined to 1.9 Å resolution by molecular replacement using the S. gordonii GcnA structure (Protein Data Bank code 2EPN) as a search model. The final refined model of GH20C contained four molecules of the protein in the asymmetric unit. The structure of the GH20C monomer comprises three domains, referred to as domains I-III ordered from the N terminus ( Fig. 2A). Domain I is a ␤-sheet of parallel ␤-strands, packed against which is a pair of almost parallel ␣-helices. In turn, the ␣-helices sit against the outer rim of domain II. Domain II is a (␤/␣) 8 barrel, or TIM barrel, which is not only the core fold of GH family 20 but is one of the most common folds among glycoside hydrolases in general (42). Domain III is all ␣-helical with four anti-parallel helices organized in a near planar arrangement. Domain III is unusual among structurally characterized GH20 enzymes and has only been observed in GcnA (30,43). Indeed, GH20C and GcnA share the same domain organization and an overall root mean square deviation of 0.79 Å over 619 matched C␣s. Being the core catalytic domain of the GH20s, domain II is well conserved among all of the structurally characterized members of the family, whereas domain I is present in all but Aggregatibacter actinomycetemcomitans DspB and the two GH20s found in StrH (43,44). In these latter enzymes domain I is replaced by a FIVAR-like (found in various architectural regions) domain (9).
The four molecules of GH20C present in the asymmetric unit were modeled as two identical homodimers. The primary interface of the dimer is at domain III where the C-terminal ␣-helices of each domain in the dimer pack against each other, although the distal end of the domain reaches over to interact with the edge of the (␤/␣) 8 barrel in the adjacent monomer (Fig.  2B). The dimer interface, which exceeds 3000 Å 2 of buried surface area, is completed by loops extending from the (␤/␣) 8 barrel to interact with the same loops on the adjacent monomer, although this interaction contributes the minority of the dimer  interface. In this dimer, there are predicted to be 45 hydrogen bonds and 19 ion pairs. Notably, this same dimer is present in the four different crystal forms obtained for GH20C in this study. Furthermore, with a root mean square deviation of 0.85 Å over 1228 matched C␣s, it is extremely similar to the dimer formed by GcnA. The active site of GcnA, identified by the acquisition of a structure in complex with NGT, resides in the center of the (␤/␣) 8 barrel on the side exposed to the solvent channel bisecting the dimer. This active site is conserved in GH20C with residues Asp-222 and Glu-223 as the putative catalytic residues (Fig. 2C). These results support the idea that dimerization may be a conserved feature of GH20 enzymes possessing domain III; however, there is presently no evidence supporting dimerization as being necessary for catalysis.

Structure of GH20C in Complex With Reaction
Products-Toward understanding the interaction of GH20C with its substrates, we sought to determine its structure in complex with relevant carbohydrates and carbohydrate analogs of both gluco-and galacto-configured chemistries. Initially, we soaked crystals of GH20C with GlcNAc to obtain a product complex. This crystal form contained only a single dimer of GH20C in the asymmetric unit, and we found electron density in F o Ϫ F c maps consistent with GlcNAc in each of the predicted active sites, although the electron density indicated that the complexes were not fully occupied or were partially disordered (Fig. 3A). We could not model GlcNAc in the expected 4 C 1 conformation, but rather we were able to model the electron density with a GlcNAc conformation resembling the "4-sofa" conformation expected for the Michaelis complex, as observed in other GH20 enzymes (3,9,45). In this conformation the acetamido group is bent underneath the pyranose ring with acetamido oxygen positioned on its potential trajectory for nucleophilic attack on C1 (Fig. 3B). In the active site of one GH20C monomer, Asp-222 and Glu-223 engage the substrate with Glu-223 appropriately positioned 2.7 Å from O1 of the GlcNAc to act as the acid base and Asp-222 interacting with the acetamido nitrogen of the monosaccharide (Fig. 3B). In the other active site, however, the loop comprising residues 219 -225, and thus containing the catalytic residues, is retracted from the GlcNAc residue preventing engagement of the catalytic residues by the sugar and creating what is presumably a catalytically incompetent conformation of the enzyme (Fig. 3C). Notably, the retracted conformation of this loop is present in all four of the monomers in the unliganded structure. In GcnA, the retracted conformation was observed in the ligand-free form of the enzyme, whereas the engaged conformation was found for an NGT-bound complex (30).
In the active site where the catalytic residues are fully engaged, the GlcNAc sits in a cradle of aromatic residues formed primarily by Trp-266, Trp-306, Tyr-308, and Trp-373, although its polar groups are coordinated by a series of direct and water-mediated hydrogen bonds between the sugar and enzyme (Fig. 3B). All of these interactions, except those mediated by Asp-222 and Glu-223, are maintained in the active site even when the catalytic residues are not engaged (Fig. 3C).
In an effort to better understand both the recognition of a GalNAc glycon and other potential aglycon components of substrates, we approached obtaining non-hydrolyzed substrate complexes by inactivating GH20C through mutation of Glu-223, the acid/base catalyst, to a glutamine residue. The E223Q mutant retained only ϳ6% of the wild-type enzyme activity on pNP-GlcNAc; however, soaking crystals of GH20CE223Q with ␤-D-GalNAc-(133)-␤-D-Gal resulted in electron density in both active sites of the dimer that could only be modeled as monomeric GalNAc. The electron density in F o Ϫ F c maps terminated at O1 of the GalNAc, suggesting that either the aglycon was entirely disordered or the substrate was hydrolyzed (Fig.  3A). Given that the electron density terminated clearly and cleanly, we favor the possibility that the substrate was cleaved over the 30-s time scale of the crystal soaking experiment, despite the reduced activity of the mutant. Unlike the GlcNAc product complex, this GalNAc complex showed the active sites of both monomers to be in the conformation that fully engages the catalytic residues. Both GalNAc residues were in the relaxed 4 C 1 chair conformation and made the same suite of interactions with the enzyme as the distorted GlcNAc (Fig. 3D).
Inhibition of GH20C-The K m and k cat values for pNP-GlcNAc and pNP-GalNAc appeared to display compensating changes resulting in a similar overall efficiency for the hydrolysis of both these substrates by GH20C. To more effectively probe the role of the stereochemistry at C4, independent of substrate turnover, we examined the ability of two pairs of compounds to inhibit GH20C by a kinetic approach using pNP-GlcNAc as a substrate. One pair of related compounds, galactopyranosylidene)amino N-phenyl carbamate (GalPUG-NAc), which differ only by being C4 epimers, have an sp 2 -hybridized center at C1 that resembles the planar oxocarbenium ion of the predicted transition state. This feature, along with fortuitous interactions between the enzyme and the phenylcarbamate group of these compounds, often make them very effective N-acetylhexosaminidase inhibitors (46). N-Acetyl-Dglucosamine-thiazoline (NGT) and GalNGT, which also only differ by being C4 epimers, were chosen for their ability to mimic the oxazoline intermediate of the substrate-assisted catalytic mechanism utilized by GH20 enzymes (47). A Dixon-plot analysis showed that all four compounds were competitive inhibitors of GH20C (Fig. 4). The PUGNAc-based compounds were the most potent with K i values of 7.9 (Ϯ0.7) nM for PUG-NAc and an ϳ8-fold better K i value for Gal-PUGNAc of 1.1 (Ϯ0.1) nM (Fig. 4, A and B). NGT and Gal-NGT yielded K i values of 60.6 (Ϯ3.0) and 1130 (Ϯ110) nM, respectively (Fig. 4, C  and D). The obtained pattern for general inhibitor efficacy (i.e. GalPUGNAc Ͼ PUGNAc Ͼ NGT Ͼ GalNGT) is consistent with previous inhibition data obtained for the same set of inhibitors against human and bacterial GH20 enzymes (4). The observation that the K i of GalPUGNAc was eight times better than that for PUGNAc, but the K i for GalNGT was 20 times weaker than for NGT, indicates the enzyme does not consistently favor an axial or equatorial O4 in these pairs of inhibitors. This does not resolve the ambiguity in the role of O4 in substrate recognition by GH20C and suggests that the O4 is not the only feature that discriminates recognition of these inhibitor pairs.
We determined the structures of GH20C in complex with all four compounds to examine the molecular basis underlying the relative selectivity of these inhibitors. For both GalNGT and NGT, single dimers were modeled in the asymmetric unit with each monomer containing clear electron density for the inhibitor, promoting easy modeling of the compounds (Fig. 5, A and  B). Similar to the GlcNAc product complex, which was obtained from an orthorhombic crystal with the same unit cell dimensions as the NGT complex, one of the GH20C monomers in the NGT complex had an active site conformation with the catalytic residues retracted; the other monomer was found in the expected "active" conformation. As anticipated based on other studies of GH20 enzymes, both GalNGT and NGT adopted the 4 C 1 conformation with the catalytic intermediatemimicking thiazoline ring accommodated in a pocket formed primarily by Trp-266, Trp-306, Tyr-308, and Trp-373. Overall, when comparing the engaged conformations of the GalNGT and NGT complexes, the active site residues were in the same conformations, within positional error of the structures (Fig.  5C). The only outlier to this observation is the side chain of Arg-94, which is positioned differently in the NGT complex to accommodate hydrogen bonding with the equatorial O4 of this ligand. Similarly, the GalNGT and NGT compounds were in very similar positions with the primary difference being the positions of the O4 hydroxyls in these C4 epimers.
The single molecule of GH20C in the GalPUGNAc complex was found with the active site in the fully engaged conformation and with clear electron density for the inhibitor (Fig. 5D). The GalNAc portion of GalPUGNAc made the same set of interac- tions with the protein as observed for GalNAc and Gal-NGT (Fig. 5E). Along with mimicry of the putative transition state conformation, the additional hydrogen bonding interactions between the phenyl carbamate moiety and Tyr-309 and Glu-223 likely contribute to the high affinity of this inhibitor. Of the four molecules of GH20C in the PUGNAc-bound complex, three were found with the active site occupied but in the retracted conformation. These three inhibitors were found with the phenyl carbamate group in the same orientation as that of Gal-PUGNAc, but due to the retraction of the loop carrying the catalytic residues, only Tyr-309 hydrogen bonds with this chemical group (Fig. 5F). The fourth molecule in the asymmetric unit was found in the bound and fully engaged state. This was modeled with the phenyl carbamate group in a reversed conformation relative to the other bound inhibitors (Fig. 5G), yet with a similar complement of hydrogen bonds as for the recognition of GalPUGNAc (Fig. 5E). However, the electron density for the phenyl moiety of this molecule was not clear suggesting disorder in this region and the potential for the inhibitor to adopt more than a single conformation (Fig. 5G).

Discussion
Many ␤-N-acetyl-hexosaminidases have some activity on both terminal GlcNAc and GalNAc residues (e.g. human HexA and HexB); however, usually such enzymes display specificity for one configuration of sugar or the other (9,44,48). GH20C from S. pneumoniae is among only a small number of characterized enzymes that have similar activities on both types of terminal sugar. In our hands, recombinant GH20C showed different K m and k cat values for the cleavage of synthetic GlcNAc and GalNAc ␤-aryl glycosides. However, the differences in these two kinetic parameters for the two substrates, which differ only by being C4 epimers, were compensating, resulting in quite similar k cat /K m values and thus representing a similar overall specificity for the two substrates. GH20C shows selectivity for GlcNAc and GalNAc through a set of interactions in its glycon-binding site that is common to the GH20 enzymes, most conspicuously through a generally conserved aromatic nest in the active site. Notably, interactions between Asp-222 and the acetamido nitrogen of the sugar and Tyr-308 and the acetamido oxygen of the sugar provide specificity for the 2-acetamido group. Despite the different stereochemistries of GlcNAc and GalNAc at C4, the Asp-375 side chain is positioned such that it is able to maintain potential hydrogen bonding if the O4 of the sugar is axial or equatorial. Thus, the ability of the enzyme to accommodate these two sugars with very similar sets of molecular interactions is consistent with the roughly equal ability of GlcNAc or GalNAc substrates to be hydrolyzed by GH20C.
In keeping with its ability to hydrolyze both tested synthetic substrates, GH20C was able to cleave di-and trisaccharides that terminated in either GlcNAc or GalNAc and did so with little apparent selectivity for the aglycon moiety, at least from a qualitative perspective. This suggests that unlike StrH, which is highly specific for terminal ␤(1-2)-linked GlcNAc residues in N-glycans, GH20C is able to cleave a relatively wide range of N-acetylhexosamine terminating sugars. Indeed, GH20C appears able to cleave motifs found in O-glycans (compounds 1, 2, 4, 5, and 8 in Table 2; compound 8 is not normally found in glycans but is an approximation of ␤-D-GlcNAc-(133)-␤-D-GalNAc found in O-glycans) and N-glycans (compound 7 in Table 2). GH20C was also able to cleave chitobiose (␤-D-GlcNAc-(134)-␤-D-GlcNAc; compound 9), which is found in the core structure of N-glycans and in chitin. It was unable to cleave compound 3, a motif found in glycosphingolipids (Table 2). This is consistent with the architecture of its active site, which is very open at the potential aglycon-binding site and would likely afford only limited, if any, additional interactions with a specific aglycon residue(s) (Fig. 6A). In contrast, the GH20A domain of StrH, which has a very constrained specificity, has a narrow active site in the aglycon binding region, providing specificity for the terminal ␤-D-GlcNAc-(132)-␤-D-Man motif in complex N-linked glycans (Fig. 6B) (9). In the case of ␤-D-GalNAc-(134)-␤-D-Gal, the only sugar tested that GH20C did not have activity on, the axial O4 of the Gal residue results in the sugar having a bent conformation, which we predict would result in steric clash with Glu-223, the amino acid acting as the catalytic acid/base. In vitro, GH20C displayed little selectivity for the cleavage of di-and trisaccharides, suggesting the potential for broad specificity in a biological setting. This leads one to rationalize how such small soluble sugars may be provided to the bacterium in an environment where such sugars are only found attached to host tissues. The release of soluble GH20C substrates would require endo-acting glycoside hydrolases to release them from those glycans. At present, however, the repertoire of such known or predicted enzymes in S. pneumoniae is extremely limited as the bacterium appears to favor an approach whereby host glycans are trimmed from the non-reducing ends by exoacting glycoside hydrolases (27). Related to this, should GH20C be an intracellular enzyme, as predictions suggest, its substrate(s) would have to be imported by specific carbohydrate transporters. This in turn implies a need for a diversity of transporters to import the potentially wide variety of carbohydrate substrates for GH20C. Indeed, contradicting this possibility, the carbohydrate transporters of S. pneumoniae have primarily been associated with transport of monosaccharides, consistent with the predominance of exo-acting glycoside hydrolases produced by this bacterium (49). Based on this, we presently favor the proposal that the most logical location for GH20C is outside of the bacterial cell, where due to its extracellular location the enzyme would have easy access to the non-reducing terminal GlcNAc or GalNAc residues of a wide variety of substrates, independent of the presence of particular endo-acting enzymes or specific carbohydrate transporters. Indeed, our detection of GH20C in culture supernatants provides some support for our proposed extracellular location of the enzyme, despite its lack of known motifs directing the secretion of this protein. Furthermore, there is precedent for such non-classically secreted glycoside hydrolases in S. pneumoniae of which the most relevant to this scenario is the exo-␤-1,3-galactosidase BgaC that was found localized to the cell surface of this bacterium (50).
The capability of GH20C to tolerate a broad range of aglycon residues is likely related to its very open active site, which therefore results in a potential lack of specific interactions with an aglycon. This interpretation, however, does not take into account the quaternary structure of the enzyme. By virtue of dimer formation, the entrance to the active site becomes a tunnel that extends ϳ25 Å along the dimer interface until the glycon-binding site is reached (Fig. 6C). If GH20C does operate in some capacity on tissue/cell-tethered glycans as an extracellular exo-glycosidase, rather than on soluble glycans released by another enzyme, the architecture of the active site entrance may influence the biological specificity of the enzyme. First, the depth of the entrance suggests that the glycan must reach ϳ25 Å into the active site, which suggests a glycan of at least four or five sugar residues adopting an extended conformation would be required to access the catalytic site. Second, the shape of the active site entrance may place additional constraints on the substrate by selecting for substrates with an overall tertiary conformation that can be accommodated by the opening to the active site. Precedent for this is seen in the two catalytic modules of StrH (9). These catalytic modules have N-acetylglucosaminidase activity but limited N-acetylgalactosaminidase activity, and they are able to release GlcNAc from a variety of disaccharides. However, when terminal GlcNAc or GalNAc  DECEMBER 25, 2015 • VOLUME 290 • NUMBER 52 residues are considered in the context of their display on biological glycans, not small soluble sugars, the catalytic modules of StrH become extremely specific for the terminal GlcNAc residues on complex N-glycans as the distal aglycon-binding sites, and overall active site architecture select for these large glycans and legislate against accommodation of GlcNAc or Gal-NAc residues displayed on different glycans. Therefore, although GH20C appears to have limited substrate selectivity in vitro, we acknowledge the possibility that its substrate in vivo may indeed be more specific, but it is very likely different from the substrate of StrH.

GH20C Structure and Activity
The catalytic mechanism utilized by members of glycoside hydrolase family 20 is well established, and our structural studies of GH20C provided results that are entirely consistent with the features expected for the substrate-assisted catalytic mechanism. Our structures, however, also provided unexpected insight into some of the nuances of substrate recognition and cleavage by this enzyme. Structural changes that can range from small and subtle to quite large in glycoside hydrolase active sites upon substrate binding is not the rule nor is it uncommon. Indeed, the S. gordonii homolog of GH20C, GcnA, undergoes a change in the active site upon inhibitor binding whereby the catalytic residues go from a retracted state to engaging the inhibitor in a manner consistent with being positioned for catalysis (30). A nearly identical structural change was observed for GH20C upon binding both products of catalysis and inhibitors, indicating a trajectory where the loop containing the catalytic residues (the catalytic loop) goes from retracted and disordered to engaged and ordered. Unexpectedly, however, the retracted form of the catalytic loop was also observed in product-and inhibitor-bound forms of the enzyme, indicating that engagement of the catalytic loop is not necessarily a prerequisite to the enzyme binding its ligands.
In the catalytic landscape, the substrate-stabilized and engaged form of the enzyme would typically lead to formation of the Michaelis complex, which in the case of ␤-glycoside hydrolases involves distortion of the glycon residue to generate a sugar conformation poised to promote hydrolysis of the glycosidic bond. In the case of GH20C, the GlcNAc in the product complex with this sugar was observed to be in a distorted 4 E half-envelope conformation, despite the absence of an aglycon, whereas the GalNAc product complex was observed to be in the lower energy chair conformation. Surprisingly, the GlcNAc residue present in the retracted conformation of the active site was also in the distorted conformation, suggesting in this enzyme that engagement of the catalytic residues is not requisite for substrate distortion in preparation for formation of the Michaelis complex. Thus, although we acknowledge that the somewhat locked forms of the protein in the crystal lattice(s) may bias the states of the trapped complexes, the results nevertheless suggest a complex trajectory of substrate recognition whereby the enzyme may be able to form multiple pre-Michaelis complexes. At present it is unclear whether these may represent intermediates along a pathway to formation of the fully engaged and catalytically competent conformation or whether they are non-productive complexes.
GH20C was able to catalyze the cleavage of pNP-GlcNAc and pNP-GalNAc with similar efficiency, as judged by k cat /K m , but with somewhat different yet compensating values for the kinetic parameters k cat and K m . To provide insight into this, we examined the inhibition of the enzyme by pairs of galacto-configured and gluco-configured inhibitors. In doing this, we revealed a pattern of K i values that was previously observed for GH20 enzymes as follows: Gal-PUGNAc Ͻ PUGNAc Ͻ NGT Ͻ GalNGT (4). The significance of this is there is no consistent preference for the axial or equatorial C4 hydroxyl. However, what the relationship does suggest is that when an aglycon (e.g. the phenylcarbamate of GalPUGNAc and PUG-NAc) is present (i.e. as in the Michaelis complex), the axial O4 is preferred for binding. Conversely, when the aglycon is absent (i.e. after departure of the aglycon leaving group), as represented by the thiazoline inhibitors, the equatorial O4 is favored. Thus, on this basis, we tentatively suggest that although GH20C converts GlcNAc and GalNAc substrates to products with generally similar overall efficiency, this may be achieved through slightly different energy landscapes leading to the catalyzed release of the two different terminal sugar epimers.
The potential impact and role that GH20C may have on pneumococcal virulence is presently unknown. Although other carbohydrate-degrading enzymes from S. pneumoniae have demonstrated roles in a variety of aspects of the host-bacterium interaction, such as adherence and innate immune response evasion (17,24,51), a common observation is their participation in nutrient acquisition (23,25), which is thought to help support microbial persistence in the host. Based on our biochemical results, we may speculate that this enzyme complements the highly specific StrH potentially providing an overall greater ability to process the hexosaminide sugars from host glycans thus affording an expanded nutrient reservoir in the host. This, however, and other possible contributions to the host-bacterium interaction remain to be determined.