The first crystal structure of a family 129 glycoside hydrolase from a probiotic bacterium reveals critical residues and metal cofactors

The α-N-acetylgalactosaminidase from the probiotic bacterium Bifidobacterium bifidum (NagBb) belongs to the glycoside hydrolase family 129 and hydrolyzes the glycosidic bond of Tn-antigen (GalNAcα1-Ser/Thr). NagBb is involved in assimilation of O-glycans on mucin glycoproteins by B. bifidum in the human gastrointestinal tract, but its catalytic mechanism has remained elusive because of a lack of sequence homology around putative catalytic residues and of other structural information. Here we report the X-ray crystal structure of NagBb, representing the first GH129 family structure, solved by the single-wavelength anomalous dispersion method based on sulfur atoms of the native protein. We determined ligand-free, GalNAc, and inhibitor complex forms of NagBb and found that Asp-435 and Glu-478 are located in the catalytic domain at appropriate positions for direct nucleophilic attack at the anomeric carbon and proton donation for the glycosidic bond oxygen, respectively. A highly conserved Asp-330 forms a hydrogen bond with the O4 hydroxyl of GalNAc in the −1 subsite, and Trp-398 provides a stacking platform for the GalNAc pyranose ring. Interestingly, a metal ion, presumably Ca2+, is involved in the recognition of the GalNAc N-acetyl group. Mutations at Asp-435, Glu-478, Asp-330, and Trp-398 and residues involved in metal coordination (including an all-Ala quadruple mutant) significantly reduced the activity, indicating that these residues and the metal ion play important roles in substrate recognition and catalysis. Interestingly, NagBb exhibited some structural similarities to the GH101 endo-α-N-acetylgalactosaminidases, but several critical differences in substrate recognition and reaction mechanism account for the different activities of these two enzymes.

The ␣-N-acetylgalactosaminidase from the probiotic bacterium Bifidobacterium bifidum (NagBb) belongs to the glycoside hydrolase family 129 and hydrolyzes the glycosidic bond of Tnantigen (GalNAc␣1-Ser/Thr). NagBb is involved in assimilation of O-glycans on mucin glycoproteins by B. bifidum in the human gastrointestinal tract, but its catalytic mechanism has remained elusive because of a lack of sequence homology around putative catalytic residues and of other structural information. Here we report the X-ray crystal structure of NagBb, representing the first GH129 family structure, solved by the single-wavelength anomalous dispersion method based on sulfur atoms of the native protein. We determined ligand-free, Gal-NAc, and inhibitor complex forms of NagBb and found that Asp-435 and Glu-478 are located in the catalytic domain at appropriate positions for direct nucleophilic attack at the anomeric carbon and proton donation for the glycosidic bond oxygen, respectively. A highly conserved Asp-330 forms a hydrogen bond with the O4 hydroxyl of GalNAc in the ؊1 subsite, and Trp-398 provides a stacking platform for the GalNAc pyranose ring. Interestingly, a metal ion, presumably Ca 2؉ , is involved in the recognition of the GalNAc N-acetyl group. Mutations at Asp-435, Glu-478, Asp-330, and Trp-398 and residues involved in metal coordination (including an all-Ala quadruple mutant) significantly reduced the activity, indicating that these residues and the metal ion play important roles in substrate recognition and catalysis. Interestingly, NagBb exhibited some structural similarities to the GH101 endo-␣-N-acetylgalactosaminidases, but several critical differences in substrate recognition and reaction mechanism account for the different activities of these two enzymes.
Bifidobacterium is a well-known representative genus of probiotics in human gut microbiota (1). Various health-promoting effects of bifidobacteria have been reported, including prevention of infections by pathogens (2) and alleviation of allergy responses (3). These bacteria mainly reside in the lower intestine of healthy humans, especially during the early life stages of breast-fed infants (4). As digestible carbohydrates such as starch are scarce in the lower intestine, bifidobacteria possess various glycosidases, transporters, and metabolizing enzymes for utilizing indigestible oligosaccharides and glycoconjugates. A well-studied example of this is the system that utilizes human milk oligosaccharides (5,6). Interestingly, it has also been revealed that bifidobacteria utilize mucin glycoproteins that exist on human epithelial cell layers of the digestive tract (7). It has been recently shown that mammalian gut microbiota degrades mucous glycoproteins as a nutrient source under dietary fiber-deprived conditions (8). The carbohydrates of mucin glycoproteins are highly complex and branched O-glycans (9). Eight prevalent core structures of mucin O-glycans are defined, all of which are covalently linked via an N-acetyl-D-galactosamine (GalNAc) residue to the hydroxyl group of Ser or Thr through an ␣-glycosidic bond (10). Previous studies have revealed that Bifidobacterium bifidum JCM 1254 possesses various glycosidases such as ␤-galactosidase (11), ␤-N-acetylhexosaminidase (11), exo-␣-sialidase (12), 1,2-␣-Lfucosidase (13), and 1,3/1,4-␣-L-fucosidase (14) to sequentially liberate sugars from the non-reducing ends of various glycan structures.
Specifically, utilization of GalNAc␣1-Ser/Thr, also referred to as the Tn-antigen, was implicated by the finding of an intra-cellular ␣-N-acetylgalactosaminidase (NagBb, 4 EC 3.2.1.49) from B. bifidum JCM 1254 (15). Peptides containing Tn-antigen are thought to be cleaved from the mucin core protein by extracellular proteases and then imported into the cell by an unknown transporter. NagBb then hydrolyzes the ␣-linkage between GalNAc and peptide for further metabolism. Of note is that NagBb was found from the genome sequence of B. bifidum JCM 1254 by virtue of a very slight sequence similarity (ϳ15%) to an extracellular endo-␣-N-acetylgalactosaminidase (EC 3.2.1.97) from Bifidobacterium longum JCM 1217 (EngBF) (16), which belongs to glycoside hydrolase (GH) family 101 in http:// www.cazy.org 5 (17). Although EngBF is an endo-type enzyme that releases ␣-linked Gal␤1-3GalNAc (galacto-N-biose) disaccharides from Ser or Thr residues (core 1 or T-antigen) in mucin-type glycoproteins, NagBb exhibits an exo-type activity to release GalNAc. As NagBb represented no sequential similarity with any previously identified glycosidases, this enzyme and its homologs formed the basis of a new GH family, 129 (15). Sequence comparison with EngBF suggested that the catalytic nucleophile and a key substrate-recognizing residue (termed as "fixer" or "anchor") of NagBb were Asp-435 and Asp-330, respectively. However, identity of the catalytic acid/base residue has remained elusive due to lack of sequence homology around these positions and X-ray crystallographic information.
Here we report the crystal structure of NagBb, which is the first three-dimensional structure of GH129. A complex structure with GalNAc reveals not only its catalytic residues but also a unique substrate recognition motif involving a metal ion. A possible molecular evolutionary route of this bifidobacterial enzyme is also discussed.

Structure determination
C-terminally His 6 -tagged protein was heterologously expressed in Escherichia coli and purified. The molecular masses of NagBb as deduced from the amino acid sequence, estimated by SDS-PAGE, and calibrated gel filtration chromatography were 71.3, 70, and 71.4 kDa, respectively, indicating that it is monomeric in solution. Before gel filtration, NagBb was treated under reductive lysine methylation reaction methodology because untreated protein did not crystallize under any conditions tested. The lysine-methylated protein sample exhibited 95% catalytic activity compared with the native protein (data not shown). The crystals of NagBb belong to space group P2 1 2 1 2 1 containing two molecules in the asymmetric unit, and the crystal structure was solved by the single-wavelength anomalous dispersion (SAD) method using sulfur atoms contained within the native protein (Table 1). Diffraction data for phasing were collected at beamline BL-1A of the Photon Factory, which is designed for long-wavelength experiments (18 -20). Fig. 1A shows anomalous difference Fourier map peaks for some sulfur atoms using the phasing data. A ligand-free structure and a product-complex structure with GalNAc were determined at 2.65 and 2.10 Å resolution, respectively ( Table 2). The 2 mF o Ϫ DF c electron density maps for the protein contoured at 1 showed continuous density for all main chain atoms, except for the His 6 tag and the following residues: 105- 4 The abbreviations used are: NagBb, ␣-N-acetylgalactosaminidase from B. bifidum JCM 1254; EngBF, endo-␣-N-acetylgalactosaminidase from B. longum JCM 1217; GH, glycoside hydrolase; CAZy, Carbohydrate-active enZyme; SAD, single-wavelength anomalous dispersion; r.m.s.d., root mean square deviation; pNP, p-nitrophenyl; SpGH101, endo-␣-N-acetylgalactosaminidase from Streptococcus pneumoniae; PDB, Protein Data Bank; DNJ, 1-deoxynojirimycin. 5 Please note that the JBC is not responsible for the long-term archiving and maintenance of this site or any other third party hosted site.

Overall structure
NagBb is composed of three domains: an N-domain (residues 1-227); a catalytic barrel domain (228 -591); and a C-domain (592-634) (Fig. 1B). The N-domain adopts a ␤-sandwich fold with two antiparallel ␤-sheets containing 18 ␤-strands. The barrel domain adopts a partly broken (␤/␣) 8 -barrel fold, whose secondary structures of ␤6-␣8 are not completely formed. A long insertion between ␤3 and ␣3 (loop-2, 376 -407) is present in this domain. This insertion corresponds to the common "domain B" of GH13 ␣-amylases and related enzymes (21). Another insertion is present between ␤1 and ␣1 (loop-1, 274 -296) and is a unique structural feature of NagBb. The C-domain consists of an antiparallel ␤-sheet with four ␤-strands. Out of six lysine residues in this protein, electron density of the side chains of Lys-236, Lys-254, Lys-275, and Lys-350 were visible. Although the protein was treated using the lysine methylation protocol, any density peaks for methyl moieties on these lysine residues were not observed.

Active-site architecture
In the complex structure, clear electron density was observed for the GalNAc ligand, as its ␣-anomer, at the center of the barrel domain ( Fig. 2A). Two carboxylate residues are closely located to the anomeric C1 atom of GalNAc; Asp-435 and Glu-478 reside between ␤4 and ␣4, and ␤5 and ␣5, respectively. The O␦2 atom of Asp-435, which has been previously designated as the nucleophile residue (15), is located 3.1 Å from the C1 atom. The distance and position of Asp-435 are suitable for a nucleophilic attack to the anomeric carbon. Glu-478 forms a hydrogen bond with the C1 hydroxyl group at a distance of 2.8 Å, indicating that this is most likely the catalytic acid/base residue. An amino acid sequence alignment of GH129 members shows that these catalytic residues are completely conserved in this family (Fig. 3B). In addition, the following residues also form hydrogen bonds with GalNAc: Tyr-329 with O3 and O4; Asp-330 (anchor) with O4; Asp-371 with O6; Tyr-433 with the carbonyl oxygen of the N-acetyl group; Asp-435 (nucleophile) with O4; and Asp-561 with O3 and amide nitrogen of N-acetyl group (Fig. 3A). In addition, Trp-398 forms a stacking interaction with the pyranose ring of GalNAc.

Metal-binding sites
Interestingly, a metal-binding site was observed near the N-acetyl group of GalNAc (M1-site, Fig. 2B). The M1-site metal is octahedrally hexacoordinated by the side chains of His-271, His-320, His-366, Asp-322, and two water molecules. Electron density peaks for the metal and two coordinating waters were clearly observed. The water molecules form hydrogen bonds with the carbonyl oxygen of the N-acetyl group of GalNAc, suggesting that NagBb utilizes the metal ion for the recognition of the N-acetyl group of GalNAc. Apart from the catalytic site, another metal-binding site (M2-site) was observed on the surface of the barrel domain (Fig. 1B). The M2-site metal ion is tetrahedrally coordinated by the side chains of Cys-407 and Cys-445 and the side chain nitrogen and main chain carbonyl group of His-450 (Fig. 2F). These metal sites were also present in the ligand-free structure. To identify the metal peaks, a crystallographic anomalous scattering analysis was carried out ( Table 1). The Bijvoet difference density map of diffraction data measured at 1.283 Å exhibits a strong peak at the M2-site (red and pink meshes in Fig. 2F), whereas the difference map of the data collected at 1.290 Å, which is above the absorption edge of zinc (1.2837 Å), showed no significant peaks (Ͻ3.0). At the M1-site, an anomalous difference Fourier peak was always observed at shorter wavelengths than of the absorption edge of calcium (3.0704 Å; see 2.7 Å wavelength data in Fig. 2D), but it was significantly reduced in the difference map of the data collected at 3.15 Å (Table 1). At the sulfur atoms contained within the protein (e.g. Met-439), significant peaks were observed in the anomalous difference Fourier maps of the diffraction data collected at long wavelength (1.9 -3.15 Å, Table 1). In addition, no X-ray fluorescence was observed from crystals of NagBb when they were excited by X-rays with the scan energy for copper, nickel, cobalt, iron, and manganese atoms (data not shown). Therefore, based on these measurements it is thought that a Ca 2ϩ and Zn 2ϩ ion were present at M1 and M2-sites, respectively. B-factor values of the fully occupied calcium atoms were refined to 21.7 and 31.9 Å 2 in chains A and B, respectively, after crystallographic refinement (  Table 3). The M1-N distances with three His residues are 2.15-2.33 Å, but from the ionic radius difference between oxygen and nitrogen (ϳ0.1 Å), the expected calcium-nitrogen distance is ϳ2.45 Å. There are only a few examples of calcium-nitrogen coordination in the database. Although octahedrally hexacoordinated site is frequently observed for Ca 2ϩ , most of their coordinating ligand atoms are oxygens, and thus the present type of calcium-binding site (O 3 N 3 ) is unprecedented.

Activity measurements
In a previous study, it was reported that the optimal pH for the activity and stable pH range of NagBb were 5.0 and 3.0 -11.0, respectively (15). However, protein samples used in this study aggregated at acidic pH range (4.5-6.0) in all buffers tested. Thus, the activity measurements were conducted at pH 6.5 or 7.0. Kinetic parameters of NagBb toward p-nitrophenyl-␣-N-acetyl-D-galactosaminide (pNP-␣-GalNAc) at pH 6.5 and 37°C were K m ϭ 2.06 Ϯ 0.23 mM, k cat ϭ 11.0 Ϯ 0.4 s Ϫ1 , and k cat /K m ϭ 5.34 s Ϫ1 mM Ϫ1 . The k cat /K m value was comparable with that of the previous report (2.35 s Ϫ1 mM Ϫ1 ) measured at pH 5.0 and 37°C (15).
We next evaluated the potency of three synthetic compounds that are inhibitors for ␣and ␤-N-acetylhexosaminidases (Fig. 4). The 1-deoxynojirimycin-type compound (1, GalNHAc-DNJ) was a potent inhibitor with a K i value of 51 nM. The crystal structure of NagBb complexed with 1 was also determined at 2.79 Å resolution (Table 2), and the inhibitor was bound at the active site in a similar manner with GalNAc (Fig.  2C). In contrast, the PUGNAc-like compound (2, Gal-PUG-NAc) only slightly inhibited the activity of the enzyme, and this could be attributed to the aglycone-binding site of NagBb not being wide enough to accommodate the large phenyl carbamate moiety. This was consistent with what has been observed

Structure of GH129 ␣-N-acetylgalactosaminidase
for 2 with GH27 ␣-N-acetylgalactosaminidases (23). The thiazoline (3, Gal-NAG-thiazoline) is a potent inhibitor of ␤-Nacetylgalactosaminidases and ␤-N-acetylhexosaminidases that process GalNAc residues with a substrate-assisted mechanism (24 -26). As expected, 3 did not substantially inhibit NagBb, which has been demonstrated to use a retaining mechanism that does not utilize the N-acetyl group of the substrate in catalysis. Specific activities of the wild-type and active-site mutants were also measured ( Table 4). All of the mutations at the catalytic residues (Asp-435 and Glu-478) and substrate-recognizing residues (Asp-330 and Trp-398) exhibited drastic reductions in enzymatic activity. Mutations at the nucleophile (Asp-435) almost completely abolished activity (Ͼ1000-fold reduction), whereas those at the acid/base (Glu-478) retained slight activity (ϳ100-fold reduction). These results are consistent with observed reductions for mutants of retaining GHs when using a synthetic substrate with an activated leaving group (e.g. pNP) (27).
Mutants at the M1-site were also constructed ( Table 4). As proteins of the two single mutants (H271A and H320A) were not stable at pH 6.5, their activities were measured at pH 7.0 and compared with the activity of wild type at the same pH. Mutations at the M1-site also greatly reduced the activity, with H271A and H320A exhibiting only ϳ3% activity relative to wild type. The quadruple Ala mutant (H271A/H320A/D322A/ H366A, HHDH-A) was stable at pH 6.5, and its activity decreased by 1000-fold. The crystal structure of the HHDH-A mutant was also determined at 1.9 Å resolution ( Table 2). The quadruple mutant was confirmed from the clear electron density map of the active site (Fig. 2E), with there being no significant structural difference compared with the wildtype enzyme (r.m.s.d. for C␣ atoms ϭ 0.22 Å with the GalNAc complex for all 8096 protein atoms). No density peak was observed for the M1-site metal, and a glycerol molecule was bound at a position corresponding to the C4 -C5-C6 atoms of GalNAc.
To investigate effects of the metal ions, the activity of EDTAtreated enzyme was measured in the absence or presence of various metal ions (Fig. 5). The relative activity of the enzyme sample treated with 10 mM EDTA for 10 min at 4°C was 50% that of untreated native enzyme. Further experiments were conducted using this EDTA-treated enzyme by adding metals or EDTA during the measurements. The activity further decreased in the presence of 1 mM EDTA. Addition of Fe 2ϩ , Mn 2ϩ , Co 2ϩ , and Ni 2ϩ at low concentrations slightly recovered the activity (ϳ30%), but the presence of these metals at high concentrations resulted in inhibition or aggregation of the protein. Other metals (Cu 2ϩ , Zn 2ϩ , Ca 2ϩ , and Mg 2ϩ ) showed inhibition regardless of concentrations. Considering that NagBb possesses at least two metal-binding sites with different species, the effects of metals on the activity may be complicated. The increase of the activity in the presence of Fe 2ϩ , Mn 2ϩ , Co 2ϩ , and Ni 2ϩ may indicate that these metal ions are also suitable for use in substrate recognition at M1-site. Overall, the catalytic mechanism and active-site architecture can be shown schematically, with the proposed reaction mechanism of NagBb proceeding via the general mechanism of retaining GHs (Fig. 6) (28).

Structural comparison with other GHs
A structural similarity search of NagBb using the DALI Lite version 3 server (29) revealed that it has structural homology to GH13 ␣-amylase I from Thermoactinomyces vulgaris (2D0F; Z-score ϭ 17.7; r.m.s.d. for 637 C␣ atoms ϭ 5.6 Å) (30), GH36   (33), and several other enzymes in GH13, GH31, and GH36. These structural homologs of NagBb are all retaining GHs acting on ␣-galactosides or ␣-glucosides. All of the GH13, GH31, GH36, and GH129 members share the following domain architecture: N-terminal ␤-sandwich domain; core catalytic (␣/␤) 8 -barrel domain; and C-terminal ␤-sheet domain. Molecular sizes of GH101 enzymes (SpGH101 and EngBF) are more than twice of NagBb, and they carry peripheral domains in their N and C termini (domains 1 and 5-7), in addition to the central three domains (domains 2-4) similar to those of NagBb (34). The catalytic site architecture of these families can be divided into two groups. GH31 and GH36 have two Asp residues as the catalytic residues, whereas GH13, GH101, and GH129 (NagBb) employ Asp (nucleophile) and Glu (acid/base catalyst). In all of the five GH families, the acid/ base residue is located in an anti-position to the ring oxygen of the carbohydrate in the Ϫ1 subsite (anti-protonator, Fig. 2A) (35). In addition, when the active site of NagBb was superimposed with these structural homologs, only the GH101 enzymes had their catalytic residues at similar positions (discussed below). Although the ligand-free crystal structures of two GH101 enzymes (SpGH101 and EngBF) have been reported (33,34), detailed substrate recognition and catalytic mechanism of this family only became clear when the structure of SpGH101 complexed with galacto-N-biose was determined (32). Therefore, we compared the NagBb-GalNAc complex with the SpGH101-galacto-N-biose complex (PDB code 5A59). Superimposition of these structures revealed that their corresponding domains, N-domain (1-227) of NagBb and domain 2 (307-596) of SpGH101, and core barrel domain (228 -591) of NagBb and domain 3 (597-863) of SpGH101, respectively, overlap well ( Fig. 7A; r.m.s.d. ϭ 2.9 Å for 202 C␣ atoms for the N-terminal ␤-sandwich domains and r.m.s.d. ϭ 3.2 Å for 284 C␣ atoms for the core domains). The N-terminal half of domain 4 of SpGH101 also adopts a similar 4-stranded ␤-sheet architecture to the C-terminal domain of NagBb(592-630). The active site (nucleophile, acid/base, and GalNAc in the Ϫ1 subsite) of NagBb and SpGH101 also overlap (Fig. 7B). For SpGH101, a mechanism involving a Grotthuss proton shuttle has been proposed (32). This was due to the distance between the O1 atom of GalNAc (in galacto-N-biose) and the acid/base residue (Glu-796) being relatively long (4.3 Å), with a well-ordered water molecule bridging the two moieties. In the case of NagBb, the acid/base and O1 of the GalNAc is directly hydrogen-bonded (2.7 Å) without intervention of a water molecule, indicating that NagBb probably utilizes a common direct proton donation mechanism. Although SpGH101 uses a Grotthuss mechanism, mutations of the catalytic residues gave a similar result with NagBb; mutations at the nucleophile (D764A) and the acid/base (E796A and E796Q) of SpGH101 decreased the k cat value by 700-and 30-fold, respectively (32).
The highly conserved anchor residue (Asp-330), which holds the carbohydrate in the Ϫ1 subsite, and other residues recognizing the GalNAc moiety in NagBb are also similarly positioned in SpGH101 (Fig. 7C). Residues involved in the recognition of the galactose moiety (Ϫ2 subsite) in SpGH101 (Gln-868, Lys-1156, Glu-1253, and Asp-1254) are not conserved in NagBb which highlights the different substrate specificities of the two enzymes. The M1-site is also not present in SpGH101, and only one histidine residue (His-694), which corresponds to His-366 in NagBb, is conserved.
Comparison of the substrate-binding pockets of NagBb and SpGH101 clearly illustrate their different substrate preferences as the Ϫ2 subsite of NagBb is blocked by protein residues (Fig.  7D). In GH101, two conserved Trp residues (Trp-724 and Trp-  a Activities were measured at 0.25 mM pNP-␣-GalNAc in 50 mM MES-NaOH (pH 6.5 or 7.0) at 37°C. b The quadruple mutant is H271A/H320A/D322A/H366A. 726 in SpGH101 and Trp-748 and Trp-750 in EngBF) play an important role in substrate recognition by providing aromatic platforms for the Ϫ1 and Ϫ2 subsites, respectively (33), and they jointly close on substrate binding (32). In NagBb, Trp-398 is solely involved in the recognition in the Ϫ1 subsite. The residue corresponding to the second Trp is Gly-400 in NagBb, and there is no substitute residue found in the Ϫ2 subsite. Of note is that no conformational change was observed upon substrate binding of NagBb. The activity of W398A mutant was less than 0.6 milliunits/mg (Table 4). This decrease was as prominent to the most destructive nucleophile mutant (D435A) and thus emphasized the importance of Trp-398 for the catalytic function of NagBb.

Involvement of metal site in GHs
Even though the involvement of the M1-site in the substrate recognition of NagBb has been demonstrated in this study, interestingly this site is not conserved in distant members of GH129 (Fig. 3B). Structurally related enzymes in GH13, GH31, GH36, and GH101 also do not possess a metal site for substrate recognition. There are many examples of metal-dependent stabilization of GHs, including the case of Ca 2ϩ for GH13 ␣-amylases (36). However, only a limited number of GH enzymes directly involve a metal for substrate recognition and catalysis. Inverting ␣-mannosidases in GH38, GH47, and GH92 are

Structure of GH129 ␣-N-acetylgalactosaminidase
prominent examples of the requirement of a divalent cation for enzymatic function (37)(38)(39). In these families, a Ca 2ϩ or Zn 2ϩ ion is involved in substrate recognition by bridging the O2 and O3 hydroxyls of Man in the Ϫ1 subsite. The ion also aids the catalysis by stabilizing the distorted sugar conformation in these ␣-mannosidases (39,40). In addition, direct involvement of a divalent cation (Ca 2ϩ , Zn 2ϩ , or Mn 2ϩ ) in substrate recognition and catalysis has been demonstrated for GH4 (41), GH43 (42), GH62 (43), GH97 (44), GH106 (45), and GH127 (46). In the substrate-binding sites of GH43 endo-1,5-␣-L-arabinanase and GH62 ␣-L-arabinofuranosidases, a metal ion, which is hexacoordinated or heptacoordinated by side chain atoms of a His, a Gln, and water molecules, was assigned as a Ca 2ϩ (42,43). Treatment of these enzymes by EDTA had no apparent effect on the activities, but mutation of the His residue significantly reduced the activity. To the best of our knowledge, NagBb is the first example of a GH enzyme that utilizes a metal ion for the recognition of the N-acetyl group of the sugar substrate.

Possible molecular evolution of GH129 enzymes
The structural similarity between GH129 and GH101 strongly suggests that they share a common protein ancestor. However, the two families currently retain only a slight sequence homology within a limited region (15), and it is also interesting how they acquired distinct substrate specificities and proton donation mechanisms.
The GH129 family currently consists of ϳ60 protein sequences exclusively from bacteria, and NagBb is the sole characterized member. Close homologs of NagBb (sequence identity Ͼ60%) are only present in the genomes of Bifidobacteria species, and they currently account for more than half of the family member sequences (ϳ30 sequences). It is noteworthy that many infant-associated bifidobacterial species, including B. bifidum, Bifidobacterium breve, B. longum, and B. longum subsp. infantis, possess a NagBb homolog gene. Therefore, these bifidobacterial enzymes probably have the same activity (exo-␣-N-acetylgalactosaminidase) and are involved in the utilization of human mucin glycoproteins, along with the endotype GH101 enzymes. Distant NagBb homologs of GH129 members are mostly found in the genomes of soil bacteria. Interestingly, two human-related bacteria (Bacillus cereus and Zoebellia galactanivorans) possess a GH129 gene. B. cereus is an opportunistic pathogen that sometimes causes food intoxication and survives in the human gastrointestinal tract (47). For the marine bacterium Z. galactanivorans, its GH genes, which are active on algal polysaccharides, were thought to be horizontally transferred to the genomes of gut microbes (Bacteroidetes) in Japanese individuals, in conjunction with degradation of seaweeds in a daily diet (48). As the M1-site residues are not conserved in the distant homologs of GH129 (Fig. 3B), they may indeed possess different substrate specificities. This may indicate possible horizontal gene transfer events from food-associated microbes to the bifidobacterial species and the establishment of the metal-binding site through molecular evolution.

Protein production and purification
A vector encoding C-terminally His 6 -tagged NagBb (pET23b(ϩ)nagbb, residues 1-634) (15) was used for protein expression. The plasmid was introduced into E. coli C43 (DE3)-RIL (Stratagene, La Jolla, CA) for protein expression. The transformants were precultured overnight in Luria-Bertani medium containing 100 mg/liter ampicillin and 35 mg/liter chloramphenicol at 30°C. A 5-ml portion of the culture was inoculated in 1.5 liters of Luria-Bertani medium containing 100 mg/liter ampicillin and 35 mg/liter chloramphenicol at 37°C. When the optical density at 600 nm reached 0.6, isopropyl 1-thio-␤-D-galactopyranoside was added to a final concentration of 0.5 mM to induce protein expression. Following an additional incubation at 25°C for 10 h, the cells were harvested by centrifugation and suspended in 50 mM Tris-HCl (pH 7.8). Cell extracts were obtained by sonication followed by centrifugation to remove cell debris. The supernatant was applied to a nickel-nitrilotriacetic acid superflow column (Qiagen, Hilden, Germany) pre-equilibrated with 50 mM Tris-HCl (pH 7.8), and the column was washed with 30 mM imidazole and then eluted with 400 mM imidazole at a flow rate of 4 ml/min (on ice). For crystallization, the peak fraction was treated with a lysine methylation reaction (49), although this step was omitted in the case of protein sample preparation for activity measurements. The proteins were then subjected to gel filtration on a Superdex 200 pg 16/60 column (GE Healthcare) pre-equilibrated with 20 mM Tris-HCl (pH 7.8) containing 150 mM NaCl at a flow rate of 1 ml/min (4°C). Protein concentrations were determined using the BCA protein assay kit (Thermo Fisher Scientific, Waltham, MA) with bovine serum albumin as a standard.

Crystallography
Crystals of NagBb complexed with GalNAc were obtained at 20°C using the sitting drop vapor diffusion method. Specifically, a 0.5-l protein solution containing 15 mg/ml NagBb and 50 mM GalNAc was mixed with an equal volume of a reservoir solution containing 0.02 M magnesium chloride, 0.1 M HEPES-NaOH (pH 6.5), and 7.5% PEG 3350. Ligand-free crystals were obtained in a similar manner with the above but GalNAc was not added. The quadruple HHDH-A mutant was crystallized with 20 mM GalNAc, but the carbohydrate electron density was not observed in the resultant crystal structure. The GalNAc-DNJ complex was crystallized with 50 mM GalNAc-DNJ. Crystals were cryoprotected in the reservoir solutions supplemented with 20% (w/v) glycerol and were flash-cooled at 100 K in a stream of nitrogen gas.

Data collection, structure determination by sulfur-SAD, and crystallographic refinement
Diffraction data were collected using a photon counting pixel array detector Eiger X4M (Dectris) and charge-coupled device detector Quantum 270 (Area Detector System Corp.) on beamlines at the Photon Factory of the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). X-ray fluorescence analysis was performed using a multichannel analyzer installed on the beamlines. Data sets collected at low energy range (at wavelengths of 1.9 Å or longer) were processed with Structure of GH129 ␣-N-acetylgalactosaminidase XDS (50), and statistics were calculated with AIMLESS (51). Initial phases were calculated from sulfur-SAD data sets from native NagBb crystals. Thirteen datasets, from two crystals, collected at 1.9 Å were merged and used for substructure search with SHELXD (52). A promising substructure was found with a resolution cutoff at 3.7 Å and 70 sites as input for the parameter FIND. This number for the FIND parameter was chosen because the two molecules of NagBb have 76 sulfur-containing amino acids. Of the resulting heavy atom positions, only higher occupancy sites (ϳ30) were selected for subsequent steps. Autobuilding was performed with PHENIX AutoSol (53) using the following inputs: substructure found by SHELXD; NagBb sequence; and merged reflection file from 12 datasets collected at 2.7 Å. Data sets collected at high energy range (at wavelengths of 1.29 Å or shorter) were processed using HKL2000 (54). Manual model rebuilding and refinement were achieved using COOT (55) and REFMAC5 (56), respectively. The statistics for data collection and refinement are listed in Tables 1 and  2. Molecular graphic images were prepared using PyMOL (DeLano Scientific, Palo Alto, CA).

Activity measurement
The hydrolysis activities of the enzymes were determined by measuring the amount of pNP liberated from pNP-␣-GalNAc (p-nitrophenyl-␣-N-acetyl-D-galactosaminide; Santa Cruz Biotechnology). The standard assay mixture (50 l) contained 50 mM MES-NaOH (pH 6.5 or 7.0), 0.25 mM pNP-␣-GalNAc, and NagBb protein (0.25 g for wild-type and 5.0 g for mutants). The reaction was started by mixing the NagBb protein solution (10 l) and remaining assay mixture (40 l), both of which had been preincubated at 37°C for 10 min. The reaction was monitored using a Benchmark plus microplate reader (Bio-Rad) with a 96-well flat-bottomed transparent microplate catalog no. 9018 (Corning Glass, Corning, NY) at 37°C. The amount of 4-nitrophenolate produced was measured by the absorbance increase at 405 nm at 10-s intervals. The reaction was monitored for up to 20 min, and the enzyme concentration was set to observe a linear absorbance increase (initial rate) within this time frame. For the determination of kinetic constants, 0.125 to 8.0 mM pNP-␣-GalNAc was used. Non-linear regression curve fitting was calculated using the program KaleidaGraph (Synergy Software, Reading, PA).

Inhibition
The synthesis of inhibitors 1-3 was conducted using previously described procedures (23,24,57). The K i value for compound 1 was determined by linear regression of data tested at six concentrations of 1 (10 -500 nM) and two concentrations of pNP-␣-GalNAc (0.25 and 0.50 mM) at pH 6.5 using a Dixon plot (58).

Metal dependence
To remove metal ions, a NagBb protein solution was incubated with 10 mM Na-EDTA (pH 8.0) in 20 mM Tris-HCl (pH 7.8) at 4°C for 10 min. After removing excess EDTA by buffer exchange, its activity was measured as an "EDTA-treated" sample. For further analysis of the effect of metal ions (Fe 2ϩ , Mn 2ϩ , Co 2ϩ , Ni 2ϩ , Cu 2ϩ , Zn 2ϩ , Ca 2ϩ , and Mg 2ϩ in the form of chlo-ride salts) or Na-EDTA (pH 8.0), reagents were added to the EDTA-treated enzyme to give final concentrations of 1.0 M to 1.0 mM. Activity measurements were conducted as described above.