Structural Insights into the Substrate Specificity of Streptococcus pneumoniae β(1,3)-Galactosidase BgaC*

Background: Streptococcus pneumoniae BgaC is a GH-35 β-galactosidase of specific activity toward β(1,3)-linked galactose and N-acetylglucosamine. Results: Three aromatic residues, Trp-240, Trp-243, and Tyr-455, determine the substrate specificity of BgaC. Conclusion: BgaC and other GH-35 β-galactosidases adopt a similar domain organization and catalytic mechanism. Significance: Provided is the first structural insight into the substrate specificity of β-galactosidase toward the β(1,3)-linked galactosyl bond. The surface-exposed β-galactosidase BgaC from Streptococcus pneumoniae was reported to be a virulence factor because of its specific hydrolysis activity toward the β(1,3)-linked galactose and N-acetylglucosamine (Galβ(1,3)NAG) moiety of oligosaccharides on the host molecules. Here we report the crystal structure of BgaC at 1.8 Å and its complex with galactose at 1.95 Å. At pH 5.5–8.0, BgaC exists as a stable homodimer, each subunit of which consists of three distinct domains: a catalytic domain of a classic (β/α)8 TIM barrel, followed by two all-β domains (ABDs) of unknown function. The side walls of the TIM β-barrel and a loop extended from the first ABD constitute the active site. Superposition of the galactose-complexed structure to the apo-form revealed significant conformational changes of residues Trp-243 and Tyr-455. Simulation of a putative substrate entrance tunnel and modeling of a complex structure with Galβ(1,3)NAG enabled us to assign three key residues to the specific catalysis. Site-directed mutagenesis in combination with activity assays further proved that residues Trp-240 and Tyr-455 contribute to stabilizing the N-acetylglucosamine moiety, whereas Trp-243 is critical for fixing the galactose ring. Moreover, we propose that BgaC and other galactosidases in the GH-35 family share a common domain organization and a conserved substrate-determinant aromatic residue protruding from the second domain.

The Gram-positive human pathogen Streptococcus pneumoniae is the major causative agent of acute pneumonia, otitis media, meningitis, and septicemia, which lead annually to mil-lions of deaths worldwide (1). In the human host, S. pneumoniae encounters a variety of glycoconjugates, including mucin, host defense molecules, and glycans exposed on the epithelial surface. Like other pathogenic microbes, S. pneumoniae produces a series of secreted or surface-associated glycosidases to modify the host glycoconjugates (2)(3)(4). Genome sequencing, in combination with exploration of new virulence factors, suggests that a large number of glycosidases are necessary for the full virulence of S. pneumoniae (5,6). For instance, three surface-exposed glycosidases, neuraminidase NanA, ␤-galactosidase BgaA, and N-acetyl-hexosaminidase StrH, have been identified to sequentially hydrolyze the glycoconjugates necessary for the colonization and pathogenesis of S. pneumoniae (7,8). The action of deglycosylation is believed not only to expose the binding sites for further invasion but also to supply an abundant carbon source from the hydrolysis products (9).
Here we present the crystal structure of BgaC at 1.8 Å and its complex with galactose at 1.95 Å. BgaC is composed of 595 amino acid residues, sharing a significant sequence homology with other GH-35 members. It contains three domains: a typical TIM barrel catalytic domain, similar to other ␤-galactosidases, followed by two all-␤ domains of unknown function. This pattern of domain organization is quite similar to that of B. thetaiotaomicron ␤-gal (PDB code 3D3A) and H. sapiens ␤-gal (30). The galactose is stabilized at the center of the catalytic domain by residues conserved in all GH-35 members. A couple of residues at the center of the TIM barrel and a loop extended from the second domain make up the active site. In addition, a putative substrate entrance tunnel identified two important loops surrounding the manually constructed Gal␤(1,3)NAG at the active site. Subsequent enzymatic activity assays enabled us to characterize three key residues, Trp-240, Trp-243, and Tyr-455, that contribute to the substrate specificity toward Gal␤(1,3)NAG.

EXPERIMENTAL PROCEDURES
Overexpression and Purification of BgaC and Mutants-The coding sequence of BgaC/Sp_0060 was amplified from the genomic DNA of S. pneumoniae TIGR4 and cloned into a pET28a-derived expression vector with an N-terminal His 6 tag. The construct was transformed into E. coli strain BL21-RIL (DE3) (Novagen), growing at 37°C in 2ϫ YT culture medium (5 g of NaCl, 16 g of Bacto-tryptone, and 10 g of yeast extract/liter) containing 30 g/ml kanamycin and 34 g/ml chloramphenicol. When the A 600 nm reached about 1.0, expression of the recombinant proteins was induced with 0.2 mM isopropyl ␤-D-1thiogalactopyranoside (IPTG) for another 20 h at 16°C before harvesting. Cells were collected and resuspended in 40 ml of lysis buffer (20 mM Tris-Cl, pH 7.5, 100 mM NaCl). After 20 min of sonication and centrifugation at 12,000 ϫ g for 30 min, the supernatant containing the soluble target protein was collected and loaded onto a nickel-NTA column (GE Healthcare) equilibrated with the binding buffer (20 mM Tris-Cl, pH 7.5, 100 mM NaCl). The target protein was eluted with 500 mM imidazole, and further loaded onto a Superdex 200 column (GE Healthcare) pre-equilibrated with 20 mM Tris-Cl, pH 7.5, 100 mM NaCl. Fractions containing the target protein were combined and concentrated to 10 mg/ml for crystallization. Samples for enzymatic activity assays were collected at low concentrations (1 mg/ml). The purity of protein was assessed by SDS-PAGE, and the protein sample was stored at Ϫ80°C. The mutants were expressed, purified, and stored in the same manner as the wildtype protein.
Dynamic Light Scattering-Dynamic light scattering was carried out on a DYNAPRO-99 (Wyatt Technology Corp.) with a 532-nm green laser. Each sample was measured in single-use UV-plastic cuvettes (Wyatt Technology Corp.), first equilibrated for 2 min at 25°C, after which a time scale of the scattered light intensity fluctuations was measured. The molecular weight was analyzed with the use of the software Dynamic V6 (Wyatt Technology Corp.).
Enzymatic Activity Assays-The enzymatic activity assays of recombinant BgaC and its mutants were conducted using 4-nitrophenyl-␤-D-galactopyranoside (PNPG) (Sangon) as substrate and following the previous procedures (36) with minor modifications. The reactions were performed at 37°C in buffer containing 50 mM Na 2 HPO 4 /NaH 2 PO 4 , pH 6.5, and initiated by the addition of BgaC. Using a DU800 spectrophotometer (Beckman Coulter), the changes in absorption at 420 nm were monitored continuously, and the increase of chromogenic product 4-nitrophenol was calculated subsequently according to a standard curve of 4-nitrophenol. The final Michaelis-Menten parameters (V max and K m ) were extracted from these data by nonlinear fitting to the Michaelis-Menten equation with the program Origin 7.5.
Preparation of 1-Phenyl-3-methyl-5-pyrazolone Derivatives of Saccharides-1-Phenyl-3-methyl-5-pyrazolone derivation of saccharides complied with the previously reported procedures (37,38) with minor changes. Briefly, 10 l of reaction system was mixed with an equal volume of 0.3 M NaOH and 0.5 M methanol solution of 1-phenyl-3-methyl-5-pyrazolone, respectively. A total volume of 30 l of mixture was placed at 70°C to react for 35 min and then cooled to room temperature and neutralized with 10 l of 0.3 M HCl. The obtained solution was dissolved in 100 l of chloroform. After vigorous shaking and centrifugation, the supernatant containing derivatives was carefully transferred to another 100 l of chloroform, and this   extraction process was repeated three times; then the aqueous layer was diluted to 80 l with water before HPLC analysis. HPLC Analysis-The assays toward specific substrate were performed at 37°C in a 10-l system containing a buffer of 50 mM Na 2 HPO 4 /NaH 2 PO 4 , pH 6.5, and the disaccharide Gal␤(1,3)NAG (Santa Cruz Biotechnology, Inc., Santa Cruz, CA) at various concentrations. The reactions were initiated by the addition of enzymes and terminated by mixing with an equal volume of 0.3 M NaOH. After 1-phenyl-3-methyl-5-pyrazolone derivatization as described above, the mixture was centrifuged at 12,000 ϫ g for 10 min, and 10 l of supernatant was applied to the HPLC system (Agilent 1200 Series). Buffer composed of 20% acetonitrile and 80 mM Na 2 HPO 4 /NaH 2 PO 4 , pH 7.0, was used for equilibration of the column (Eclipse XDS-C18 column, 4.6 ϫ 150 mm; Agilent) and separation of the components at a flow rate of 1 ml/min. The sugar components of the reaction system were determined by comparison with the retention time of standard monosaccharides. The NAG standard curve was made by quantitative analysis of HPLC with a series of concentrations ranging from 0.1 to 5 mM. The final enzymatic kinetic parameters were calculated based on the yield of NAG.
Crystallization, Data Collection, and Processing-The apoform BgaC was concentrated to 10 mg/ml by ultrafiltration (Millipore Amicon) for crystallization. Crystals were grown at 289 K using the hanging drop vapor diffusion method, with the initial condition of mixing 1 l of protein solution with an equal volume of the reservoir solution (30% polyethylene glycol 3350, 0.2 M ammonium acetate, 0.1 M sodium citrate tribasic dehydrate, pH 5.6). The crystals were transferred to cryoprotectant (reservoir solution supplemented with 25% glycerol) and flashcooled with liquid nitrogen. The crystal of its complex with galactose was obtained by quick cryo-soaking technique. The crystal was immersed in the solution containing 100 mM galactose for about 300 s and mounted in a rayon loop. The diffraction images of the apo-form and galactose-binding form were recorded at 100 K in a liquid nitrogen stream using the beamline at the Shanghai Synchrotron Radiation Facility. The data sets were integrated and scaled with the program HKL2000.
Structure Solution and Refinement-The structure of BgaC was determined by molecular replacement with MOLREP using the coordinates of 38% sequence-identical B. thetaiotaomicron ␤-galactosidase (PDB code 3D3A) as the search model. The initial model was further refined by using the maximum likelihood method implemented in REFMAC5 as part of the CCP4 program suite and rebuilt interactively by using the A -weighted electron density maps with coefficients mF o Ϫ DF c and mF o Ϫ DF c in the program COOT. The galactose complex structure was refined with the Refinement program from PHENIX and rebuilt interactively with COOT. The final model was evaluated with the programs Molprobity and Procheck.
The data collection and structure refinement statistics are listed in Table 2. All structure figures were prepared with the program PyMOL (39).

RESULTS AND DISCUSSION
Overall Structure-Each asymmetric unit contains two molecules of BgaC, which are quite similar to each other with an overall root mean square deviation (RMSD) of 0.2 Å over 540 C␣ atoms. These two molecules do not form a dimer because of their small, buried interface area of 566 Å 2 . However, symmetric operation enabled us to define a homodimer of BgaC in the crystal structure with a total buried interface area of 2874 Å 2 (Fig. 1A). In fact, BgaC also exists as a dimer in solution, as confirmed by gel filtration chromatography and dynamic light scattering (data not shown). Two subunits in the homodimer are related by a noncrystallographic 2-fold symmetry axis running parallel to the depth direction of each molecule. The dimer interface is composed of two loops and ␣6 from the catalytic domain in addition to four loops and two ␤-strands of the first all-␤ domain (ABD-1). This face-to-face dimer form is quite different from that of H. sapiens ␤-gal, which forms a back-toback dimer with an interface made up of the catalytic domain, linker, and ABD-2 (30). Upon increasing the pH to 8.5, the BgaC dimer will dissociate, resulting in the inactivation of the enzyme (data not shown), in agreement with a previous report (11). Further investigation showed that BgaC exists as a stable homodimer at a pH range of 5.5-8.0, indicating the physiological condition to make BgaC active.
Each subunit of BgaC is composed of three distinct domains, a catalytic domain patched on one side by two all-␤ domains (ABDs) (Fig. 1A). The catalytic domain (Thr-2 to Glu-342) adopts a typical (␤/␣) 8 TIM barrel with some distortions. Following the catalytic domain, a stretch of residues, Ser-337 to Ser-365, containing a short ␤-strand passes through the second ABD (Leu-489 to Lys-591), prior to joining the first ABD (Ser-366 to Pro-488). Both ABDs, which were previously described as a jellyroll fold (40), share a quite similar overall structure, where F o and F c are the observed and calculated structure-factor amplitudes, respectively. d R-free was calculated with 5% of the data excluded from the refinement. e Root mean square deviation from ideal values. f Categories were defined by Molprobity.
with an RMSD of 6.4 Å over 56 C␣ atoms. However, each ABD adopts a ␤-sandwich composed of a five-stranded ␤-sheet against a three-strand one, which is somewhat different from the regular ␤-sandwich with two layers of four-stranded ␤-sheets. In addition, the stretch linking the catalytic domain and the first ABD contributes a ␤-strand to the second ABD (Fig. 1A).
Structural Comparison with Other GH-35 ␤-Galactosidases-To date, four crystal structures of ␤-galactosidase in the GH-35 family have been deposited in the PDB. Two of them, B. thetaiotaomicron ␤-gal (PDB code 3D3A) and H. sapiens ␤-gal (30), possess an overall structure of three domains similar to the 595-residue BgaC, with an RMSD of 1.8 Å and 1.7 Å over C␣ atoms, respectively. By contrast, the other two structures, the 1011-residue Penicillium sp. ␤-gal (28) as well as the 1003residue T. reesei ␤-gal (29), are composed of five domains, a catalytic domain wrapped by four ABDs. Structural superposition shows that the first and second ABDs of Penicillium sp. ␤-gal or T. reesei ␤-gal are missing in BgaC and replaced by a long stretch connecting the catalytic domain and the last two domains (Fig. 1B). Superposition of the catalytic domain of BgaC against that of Penicillium sp. ␤-gal and T. reesei ␤-gal yields the same RMSD of 1.9 Å over 312 C␣ atoms and 314 C␣, respectively. The two ABDs of BgaC could be aligned to the last two domains of Penicillium sp. ␤-gal and T. reesei ␤-gal, yielding an RMSD of 2.9 and 2.8 Å over 220 and 218 C␣ atoms, respectively.
The Active Site-In the 1.95 Å galactose-complexed structure of BgaC, a molecule of galactose fit well into the active site within the TIM barrel ( Fig. 2A). The galactose adopts a chair conformation with its O1 in the ␤-anomer configuration. Four aromatic residues, Tyr-52, Trp-240, Tyr-275, and Tyr-305 form a hydrophobic pocket to accommodate the hexose ring of galactose through stacking interactions. In addition, residues Tyr-52, Ile-95, Cys-96, Ala-97, Glu-98, Asn-155, Glu-156, Glu-238, and Tyr-305 form a hydrogen bond network to fix the hydroxyl groups of galactose.
Superposition of the complex structure to the apo-form yields an RMSD of 0.20 Å over 551 C␣ atoms, indicating no significant conformational changes of the overall structure upon galactose binding. Despite the fact that the galactosebinding residues do not undergo conformational shifts, two residues, Trp-243 and Tyr-455, close to the active site exhibit different conformations (Fig. 2B). As a result of induced fit, the side chains of Trp-243 and Tyr-455 rotate toward galactose, at an angle of 75º and 66º, respectively. Furthermore, assays of enzymatic activity toward the general substrate PNPG showed that the mutants W243A and Y455A have a much higher K m (200-and 20-fold, respectively) but a similar V max value compared with the wild type (WT) ( Table 3). We also deleted the two ABDs to check whether the catalytic domain alone could execute the hydrolysis activity. Results showed that deletion of the two ABDs completely abolished the activity toward PNPG, which might result from the breaking of the dimer interface and/or the integrity of the active site pocket.
Most active site residues of BgaC could be superimposed to those in the structure of Penicillium sp. ␤-gal in complex with galactose (28). The proton donor Glu-200Ј and the nucleophile Glu-299Ј of Penicillium sp. ␤-gal could be well superimposed to Glu-156 and Glu-238 of BgaC, respectively (Fig. 2C). In addition, Tyr-52, Ile-95, Glu-98, Asn-155, Tyr-275, and Tyr-305 took the same conformations as their counterparts in Penicillium sp. ␤-gal. Nevertheless, there are some differences at the active site between the two structures. For instance, Asn-140Ј of Penicillium sp. ␤-gal is substituted in BgaC by Cys-96, which forms a hydrogen bond with the O4 of galactose via the thiol group. In addition, Tyr-261Ј in Penicillium sp. ␤-gal is positioned at the loop following the sixth ␤-strand of the TIM barrel, and its hydroxyl group hydrogen bonds with the O⑀1 of Glu-299Ј. However, the counterpart Trp-240 in BgaC comes from the end of the seventh ␤-strand, and the N⑀1 of Trp-240 hydrogen bonds with the O⑀1 of Glu-238 ( Fig. 2A). To verify the role of Trp-240 in hydrolysis reaction, we constructed three single mutants, W240A, W240F, and W240Y. The activities toward the general substrate PNPG of both W240A and W240F mutants were not detectable, whereas the W240Y mutant retained 5% hydrolysis activity (Table 3). Multiple-sequence alignment showed that Trp-240 in BgaC is substituted by an aromatic residue, Tyr-270 in H. sapiens ␤-gal or Tyr-303 in Arabidopsis thaliana ␤-gal (Fig. 3). Moreover, in the structure of H. sapiens ␤-gal, the hydroxyl group of Tyr-270 also hydrogen-bonds with the O⑀1 of the nucleophile Glu-268 (30). We suggest that Trp-240 not only contributes to the hydrophobic pocket but also maintained, via a hydrogen bond, the orientation of the acetyl group of Glu-238, which is crucial for the hydrolysis activity.   In the best studied ␤(1,4)-galactosidase, E. coli ␤-gal, the substrate-binding site was dissected into two subsites, termed subsite Ϫ1 and ϩ1, respectively (42). The aromatic residues at the ϩ1 site were proposed to be critical for substrate specificity. Taking E. coli ␤-gal as an example, Trp-999 at the ϩ1 site was reported to contribute to stabilizing the direction of ϩ1 glucose by stacking interactions, to facilitate the cleavage of the ␤(1,4)galactosyl linkage (43). In the other two GH-2 members, C221 ␤-gal and K. lactis ␤-gal, Trp-999 of E. coli ␤-gal is substituted by Cys-999 and Cys-1001, respectively, which were proposed to have an influence on substrate binding and activity (23,24). As for A4 ␤-gal and B. circulans sp. alkalophilus ␤-gal from GH-42, residues Trp-320 and Trp-315 were proposed to act as the counterpart of Trp-999 in E. coli ␤-gal (25,26).
However, the catalytic mechanism of ␤(1,3)-galactosidases remains unclear, although the structures of H. sapiens ␤-gal (30) and B. thetaiotaomicron ␤-gal (PDB code 3D3A) are known. To decipher the structural basis of the substrate specificity toward the ␤(1,3)-galactosyl bond, we attempted to solve the complex structure of BgaC with the substrate Gal␤(1,3)NAG, but we did not succeed. As an alternative, we calculated a putative substrate entrance tunnel with the program CAVER (available on the World Wide Web). As shown in Fig. 4A, a dumbbell-shaped tunnel from the surface to the active site pocket was guarded by two loops, L A (Trp-240 to  and L B (Glu-448 to Ala-461). Loop L A between ␤10 and ␣8 came from the catalytic domain, whereas L B connecting ␤21 and 12 protruded from the first ABD (Fig. 3). A close look enabled us to find three aromatic residues, Trp-240, Tyr-455, and Trp-243, at the gorge of the tunnel (Fig. 4A).
Furthermore, by superimposing the Gal moiety of Gal␤-(1,3)NAG onto the galactose molecule in the BgaC-galactose complex structure and docking the NAG moiety at subsite ϩ1 in an optimal steric geometry, we manually constructed a model of Gal␤(1,3)NAG at the active site (Fig. 4B). The model shows that the hexose ring of the NAG moiety was almost in the same plane of the galactose moiety, sandwiched by residues Trp-240 and Tyr-455 on two sides, respectively. The nitrogen atom of the N-acetyl group formed a hydrogen bond with O⑀2 of Glu-156. The distances from ϩ1 NAG to Trp-240 and Tyr-455 were about 3.0 and 4.5 Å, respectively.
Although both residues Trp-240 and Tyr-455 are generally conserved from bacteria to plants and animals (Fig. 3) (31)(32)(33). To verify the putative role of Trp-240 and Tyr-455 of BgaC, we determined the enzymatic activities toward Gal␤(1,3)NAG of several single mutants. The results indicated that neither mutant W240A nor mutant W240F showed any activity (Table 3). By contrast, the mutant W240Y retained only about 15% activity relative to the WT, with the result that the related kinetic parameters could not be determined (data not shown). The mutant Y455A completely lost its hydrolysis activity (Table 3); however, the activity of the Y455F mutant was comparable with the WT. These results confirmed that both Trp-240 and Tyr-455 play an important role in the substrate binding and activity.
Domain Organization-After a comprehensive comparison of all ␤(1,3)and ␤(1,4)-galactosidases of known structure, we found that all members from GH-35, no matter whether they hydrolyze the ␤(1,3)or ␤(1,4)-galactosyl bond, exhibit a similar domain organization (Fig. 5). Taking BgaC as an example of ␤(1,3)-galactosidases, the catalytic TIM barrel domain makes up the first domain, where the two key residues, Trp-240 and Trp-243, are located, followed by a long loop that stretches through the second ABD to the first ABD from where the loop L B extends. Loop L B not only contains the substrate specificity determinant residue Tyr-455 but also constitutes a part of the active site pocket (Fig. 1). Similar to BgaC, H. sapiens ␤-gal and B. thetaiotaomicron ␤-gal also exhibit an arrangement of three domains. The key residues (Tyr-270 and Trp-273 in H. sapiens ␤-gal or Trp-261 and Trp-264 in B. thetaiotaomicron ␤-gal) at the catalytic domain are structurally conserved. Moreover, Tyr- ␤-gal, corresponding to Tyr-455 in BgaC, also reside at a loop that extends from the second ABD. According to the results of multiple-sequence alignment (Fig. 3), the ␤-galactosidases from Lactobacillus casei, Ailuropoda melanoleuca, and A. thaliana could also be divided into three domains with the same domain organization as BgaC (Fig. 5).
With regard to the ␤(1,4)-galactosidases in GH-35, Penicillium sp. ␤-gal and T. reesei ␤-gal also adopt a domain organization similar to that of BgaC, although they both have two extra ABDs. As a counterpart to the first ABD of BgaC, the third ABD of these proteins also extends a loop into the active site. The corresponding residues, Trp-809 in Penicillium sp. ␤-gal and Trp-811 in T. reesei ␤-gal, were predicted to play the same roles in the substrate hydrolysis specificity as Tyr-455 in BgaC, according to the structural superposition. The residues Tyr-261 and Phe-265 in Penicillium sp. ␤-gal or Tyr-260 and Phe-264 in T. reesei ␤-gal could superimpose well onto Trp-240 and Trp-243 in the catalytic domain of BgaC, respectively. The Aspergillus candidus ␤-galactosidase was predicted to contain five domains, similar to those of Penicillium sp. ␤-gal, and three conserved residues, Trp-260, Phe-264, and Trp-806, in the catalytic domain and the third ABD.