Prospecting for microbial α-N-acetylgalactosaminidases yields a new class of GH31 O-glycanase

α-Linked GalNAc (α-GalNAc) is most notably found at the nonreducing terminus of the blood type–determining A-antigen and as the initial point of attachment to the peptide backbone in mucin-type O-glycans. However, despite their ubiquity in saccharolytic microbe-rich environments such as the human gut, relatively few α-N-acetylgalactosaminidases are known. Here, to discover and characterize novel microbial enzymes that hydrolyze α-GalNAc, we screened small-insert libraries containing metagenomic DNA from the human gut microbiome. Using a simple fluorogenic glycoside substrate, we identified and characterized a glycoside hydrolase 109 (GH109) that is active on blood type A-antigen, along with a new subfamily of glycoside hydrolase 31 (GH31) that specifically cleaves the initial α-GalNAc from mucin-type O-glycans. This represents a new activity in this GH family and a potentially useful new enzyme class for analysis or modification of O-glycans on protein or cell surfaces.

␣-Linked GalNAc (␣-GalNAc) is most notably found at the nonreducing terminus of the blood type-determining A-antigen and as the initial point of attachment to the peptide backbone in mucin-type O-glycans. However, despite their ubiquity in saccharolytic microbe-rich environments such as the human gut, relatively few ␣-N-acetylgalactosaminidases are known. Here, to discover and characterize novel microbial enzymes that hydrolyze ␣-GalNAc, we screened small-insert libraries containing metagenomic DNA from the human gut microbiome. Using a simple fluorogenic glycoside substrate, we identified and characterized a glycoside hydrolase 109 (GH109) that is active on blood type A-antigen, along with a new subfamily of glycoside hydrolase 31 (GH31) that specifically cleaves the initial ␣-GalNAc from mucin-type O-glycans. This represents a new activity in this GH family and a potentially useful new enzyme class for analysis or modification of O-glycans on protein or cell surfaces.
GalNAc is found relatively sparingly in nature but plays two particularly important roles. One of these is as the first sugar attached to proteins within mucin-type O-glycans, linked to a serine or threonine residue. Installation of this sugar is achieved by one of the family of polypeptide N-acetyl-galactosaminyl-transferases (1) with specificity toward an, as yet unclear, amino acid sequence. Attachment of galactose, GlcNAc, or GalNAc to this initial GalNAc creates the other O-linked glycan core structures commonly found on secreted proteins ( Fig. 1) (2). These core structures can then be terminated by attachment of sialic acid or fucose residues or further elaborated on with N-acetyllactosamine repeats before termination with various oligosaccharide structures. The other important role of ␣-Gal-NAc, at least from a human health perspective, is as the terminal sugar attached ␣-1,3 to the galactose of H-antigen ((fucose-␣-1,2)-galactose-), thereby forming the A-antigen (GalNAc-␣-1,3-(fucose-␣-1,2-)galactose-), an important part of the human blood immune system (3). ␣-GalNAc residues also appear in a number of contexts in other life forms. For example, the outer coat of Mycobacterium tuberculosis contains both ␣-GalNAc residues and their deacetylated versions (4). Enzymes that cleave these ␣-GalNAc residues, ␣-N-acetyl-galactosaminidases (␣-GalNAcases) 7 (EC 3.2.1.49), are of interest for two particular applications. First, ␣-GalNAcases that can efficiently cleave the core ␣-GalNAc from peptides or proteins are of interest, because these could be valuable analytical tools for the analysis of O-glycan structures and their attachment points to peptides. Relatively few glycoside hydrolase (GH) families, as defined by the Carbohydrate-Active enZyme (CAZy) database (http://www.cazy.org/) 8 (5), are known to have the requisite activity. Currently, the best-characterized enzymes that cleave this bond include the GH101 endo-␣-GalNAcases, which primarily cleave the T-antigen (also known as core 1; Gal␤1-3GalNAc␣1-Ser/Thr) from the peptide (6); fungal and mammalian GH27 exo-␣-GalNAcases (such as human ␣-NAGAL), which cleave the initial GalNAc of mucin-type O-glycans as well as the terminal GalNAc of blood type A-antigens (7,8); and GH129s, which are found primarily in Bifidibacteria and are capable exo/endo-␣-GalNAcases (9). An additional class of enzymes capable of cleaving ␣-GalNAc are those of the less well-characterized GH135 enzymes, which appear to cleave ␣-GalNAc from fungal galactosaminogalactans, but not from proteins (10). Access to additional enzymes that can cleave ␣-GalNAc from peptides or proteins would increase the scope and utility of this enzyme class and may open up the possibility of engineering such enzymes into glycosynthases or glycoligases that are capable of attaching ␣-GalNAc to peptides or proteins (11)(12)(13). This would provide an alternative to the use of polypeptide N-acetylgalactosaminyltransferase in the generation of O-linked glycoproteins.
As well, access to ␣-GalNAcases that can efficiently cleave ␣-GalNAc from the A-antigen on red blood cells at pH 7.4 could allow the practical and economic conversion of A-type red blood cells to O-type (14). Whereas several glycosidases that remove A-antigen from RBCs had been discovered (15)(16)(17)(18), most worked under nonideal conditions of pH and buffer for this application until the study of Liu et al. (19). This study led to the discovery of GH family 109, a new family of ␣-Gal-NAcases (5). GH109 ␣-GalNAcases use the same NAD-dependent mechanism reported previously for GH4 family members (19,20). More recently, Rahfeld et al. (21) used functional metagenomic screening to identify a pair of enzymes, a deacetylase and a GH36 ␣-galactosaminidase, which together were capable of even more efficient conversion of A-type to O-type blood. These previous studies show the potential for identification of highly efficient enzymes (and enzymatic pathways) for A-type blood antigen cleavage from microbial genomes.
The mucous membranes of the human gut harbor large glycoproteins known as mucins. These glycoproteins present a wide variety of O-glycan structures on their surface, including A, B, and H blood antigens (22) as well as shorter mucin-type O-glycan chains (23). Gut microbes are known to digest these glycans as a carbon source within the competitive environment of the gastro-intestinal tract (2). Due to the importance of host glycans as a carbon source and the relatively few characterized microbial ␣-GalNAcases, it seems likely that gut microbes produce more enzymes capable of glycan degradation than are currently known. Therefore, screening of the gut microbiome for ␣-GalNAcase activity should expand the pool of available enzymes. The identification of these activities using classic cultivation-based screening of the human gut microbiome would neglect the majority of the bacteria, because many microbial organisms therein are thought to be uncultivable under normal laboratory conditions (24,25). Over the past decade, metagenomic techniques have been developed as a method for the investigation of genetic material in an environmental sample without the need to culture individual organisms (26). Using a functional metagenomic screening approach, new glycoside hydrolases have already been discovered in microbiomes from decaying wood, soil, and bioremediation systems (27)(28)(29) as well as human and other gut microbiomes (21,(30)(31)(32).

Characterization of a new peptide O-glycanase
In this paper, we describe the generation of two small-insert (Ͻ10 kb) metagenomic libraries derived from the gut microbiomes (fecal material) of human volunteers and their screening using fluorogenic ␣-N-acetyl-galactosaminide substrates. This approach has allowed us to identify and characterize a GH109 family member that cleaves A-type blood antigens, as well as, most interestingly, two ␣-GalNAcases from GH31, a family not previously known to harbor this activity. These GH31 ␣-GalNAcases efficiently cleave ␣-GalNAc from glycopeptides and may play a role in the intracellular harvesting of host-derived glycans by gut microorganisms.

Functional metagenomic screening
Two small-insert metagenomic libraries were constructed using extracted genomic DNA from the feces of two healthy human participants with blood type O and A. The resulting libraries consisted of 10,750 and 23,000 clones for O-and A-type, respectively, arrayed in 384-well plates. In the first screening round, an activated aryl monosaccharide glycoside substrate that releases the highly fluorescent 4-methylumbelliferone (MU) reporter upon cleavage was used to screen for ␣-GalNAcase activity (Fig. 2). Screening the metagenomic libraries against the GalNAc-␣-MU substrate identified six hits (three from each library) that produced substantial (Ͼ10 S.D.) fluorescent signal following extended incubation (Fig. 3). Of those six clones, one did not retain activity following secondary validation assays, and after sequencing, two turned out to be duplicates. Plasmids from the three unique hits, a single hit from the type O gut metagenomic library (O1) and two hits from the type A gut metagenomic library (A1 and A2) were isolated for validation, sequencing, and further characterization.
For each plasmid sequence, only a single ORF containing the proposed glycoside hydrolase domains was identified in silico. The hits O1 and A2 both matched to putative GH31 ␣-xylosidases, from Bacteroides plebeius (GenBank TM accession no. WP_007559952.1) and Bacteroides caccae (WP_005682123.1), respectively (33) (Table S1). Based on the GH classification and annotated species, O1 will be hereafter named BpGH31 and A2 BcGH31. The genes present in our small-insert library were in fact truncated relative to the full-length genes seen in the genome. BpGH31 and BcGH31 were missing 337 and 329 aa, respectively, at their C termini. These truncated versions are hereafter referred to as tBpGH31 and tBcGH31 and are shown in Fig. 4. Both GH31s included signal peptides, identified by SignalP (34), with a length of 26 aa for BcGH31 and 33 aa for BpGH31. The other hit, A1, matched a putative GH109 ␣-N-acetylgalactosaminidase from Bacteroides vulgatus (WP_005848863.1), called BvGH109_1. This too possessed a 27-amino acid signal peptide and was missing 56 aa at its C terminus. Full-length genes for all three were thus generated by cloning in the missing C-terminal amino acids based on the top BLAST hit for each and then expressed in Escherichia coli and purified. BvGH109_1 has recently also been identified in a separate functional screen of a blood type AB human gut metagenomic library carried out by our group (21). That publication describes the characterization of BvGH109_1 activity on blood group antigen substrates.

BvGH109_1 characterization
The specificity of BvGH109_1 was tested using a range of substrates previously found to be cleaved by enzymes within the GH31 and GH109 families (Table 1). Only ␣-GalNAccontaining substrates were cleaved, consistent with expectations of the GH109 family. Kinetic parameters for the cleavage of GalNAc-␣-pNP are shown in Table 2 (Table S2 shows Gal-NAc-␣-MU data).

GH31 characterization
The activities of BcGH31, tBcGH31, BpGH31, and tBpGH31 were tested against a range of substrates, and only ␣-GalNAc glycosides were cleaved (Table 1). This was interesting because no such ␣-GalNAcase activity has been observed previously with any GH31 enzymes. Michaelis-Menten kinetic parame-

Characterization of a new peptide O-glycanase
ters were thus determined for cleavage of the p-nitrophenyl ␣-GalNAc glycosides by both the truncated and the full-length versions of each protein, and similar values of K m , k cat , and k cat /K m ( Table 2 and Table S2) were observed in each case. Likewise, very similar pH profiles for hydrolysis of GalNAc-␣-pNP with an optimum at pH 6 -6.5 were seen for both versions of the two enzymes, consistent with their similar function and origin (Fig. S1).

GH31 mechanism and catalytic residues
The initial classification as a member of the GH31 family was based on sequence similarity toward the closest BLAST hit, an ␣-xylosidase. Because the substrate specificity was distinct from known activities within the GH31 family, a more detailed mechanistic investigation, including stereochemical outcome and catalytic residue identification, was performed to ensure that these enzymes were correctly classified within the GH31 family. The stereochemical outcome of cleavage of GalNAc-␣-MU by tBpGH31 and tBcGH31 was monitored by 1 H NMR, and the first-formed product was shown to be ␣-GalNAc (Fig.  5). Thus, hydrolysis indeed occurred with net retention of stereochemistry as seen for other GH31 enzymes (35) and is consistent with a double-displacement mechanism involving a covalent glycosyl-enzyme intermediate.
In most GH31 enzymes, the catalytic nucleophile and the general acid/base are highly conserved aspartic acids (36,37). Alignment of BpGH31 and BcGH31 with other GH31 sequences identified the potential catalytic aspartic acids for BpGH31 as Asp-413 and Asp-465 and that of BcGH31 to be Asp-427 and Asp-479 (Fig. 6). To confirm these assignments for tBpGH31, the alanine, glycine, and serine mutants at each of these positions were constructed and purified. Far-UV CD of the resulting nucleophile D413A/G/S and general acid/base D465A/G/S mutants (Fig. S2) revealed that the mutants adopted the same/similar secondary structure as WT tBpGH31 and thus were likely folded correctly. Assay of these mutants using GalNAc-␣-pNP as substrate revealed rate reductions consistent with their predicted roles (38) ( Table 1). Whereas no activity could be detected for the nucleophile mutants, k cat values for the acid/base mutants are ϳ450-fold lower than those of the WT enzyme. K m was also reduced by ϳ3-fold for the D465A and D465S mutants, whereas an increased K m was seen for the D465G mutant.

GH31 specificity for glycopeptide substrates
None of the GH31 enzymes tested were able to cleave A-antigen-based substrates (Table 1). It therefore seemed possible that the natural function of these enzymes was the

Characterization of a new peptide O-glycanase
cleavage of the GalNAc residues present in O-glycans on glycopeptide or glycoprotein substrates. To test this, fetuin, a glycoprotein, was chosen as substrate because it has a well-defined glycosylation pattern, primarily featuring short mucin-type O-glycans (39). None of the enzymes showed any cleavage activity on fetuin, probably due to inaccessibility of the GalNAc

Characterization of a new peptide O-glycanase
residue. To expose the GalNAc residues present within the sialyl-T-antigen glycans, the fetuin substrate was treated with a sialidase (NedA) and a 1,3-␤-galactosidase (BgaC) prior to GH31 enzyme exposure. Both BpGH31 and BcGH31, as well as their truncated versions, were able to cleave the exposed ␣-Gal-NAc, as shown by Dionex HPLC analysis of reaction mixtures ( Fig. 7 and Fig. S3). GalNAc was not released when the fetuin was not treated with the GH31s in tandem with NedA and BgaC (data not shown), indicating that these GH31 ␣-GalNAcases do not release oligosaccharides via an endo-mechanism but rather are exo-␣-GalNAcases. Moreover, the enzymes were active on both denatured and native fetuin, provided that the ␣-GalNAc was accessible ( Fig. 7 and Figs. S3 and S4). Reactions with truncated GH31s proceed more slowly than with the full-length GH31s, both on denatured and native fetuin (Fig. S5). However, after overnight incubation, equal amounts of ␣-GalNAc were eventually released. The activity of the GH31 enzymes toward a glycopeptide substrate was verified using a fluorescently labeled peptide modified with ␣-GalNAc (BODIPY-IL29 seq: QPQPT-(␣-GalNAc)AGPV) (Fig. 2). Both the full-length enzymes and their truncations readily removed ␣-GalNAc from the O-GalNAcylated peptide substrate, as evidenced by the formation of free GalNAc following incubation (Fig. 8). These results further indicate that the enzymes are true protein/ peptide exo-␣-GalNAcases.

Phylogeny
A phylogeny was generated based on all known GH31 sequences present in the CAZy database (5), as of June 2018, to gain a better understanding of the GH31 family and the different activities present within it. Both BpGH31 and BcGH31 are located in a clade containing no previously characterized members ( Fig. 9). However, a GH31 from Clostridium perfringens (CpGH31) lay in close proximity to ␣-GalNAcase GH31 enzymes identified in our screen. Grondin et al. (40) had previously determined that the three carbohydrate-binding module 32 (CBM32) domains in CpGH31 bind to GalNAc, consistent with a possible ␣-GalNAcase activity. However, they did not test for ␣-GalNAcase activity within the GH31 catalytic domain. To see if this was the case and whether the principal activity within the clade is that of an ␣-GalNAcase, the GH domain (aa 33-933) of CpGH31 was cloned from the genomic DNA of C. perfringens DSM 798 (tCpGH31) (Fig. 2), and the protein was expressed and purified. Substrate specificity testing with tCpGH31 showed specific cleavage of GalNAc-␣-MU (Table 1). Likewise, GalNAc was cleaved from treated fetuin in the same manner as seen for BpGH31 and BcGH31 (Fig. S4).

Discussion
Our use of a small-insert library construction approach in conjunction with the use of highly sensitive fluorescent glycoside substrates for screening led to the identification of two GH31 enzymes with a novel substrate specificity and one GH109. These enzymes were found to have distinct specificities toward the two known ␣-GalNAc moieties seen in the gut: mucin-type O-glycans and blood type A-antigens. Whereas GH109s are well-known ␣-GalNAcases, the GH31 family was not previously known to contain ␣-GalNAcase activity. Our identified ␣-GalNAcase GH31s act solely upon the GalNAc present at the linkage of mucin-type O-glycans with no activity toward blood type A-antigens. Characterization of these GH31s suggests that they are indeed bona fide GH31s and that the ␣-GalNAcase activity is conserved within related GH31s.
The use of small-insert libraries, rather than the larger fosmid libraries used in our previous studies (21,29,30,32), has both advantages and disadvantages. Use of the fosmid approach, in which large fragments of metagenomic DNA (typically 35-45 kb) are ligated into fosmid vectors, packaged in phage, and transduced into E. coli has the advantage of introducing more genes per host bacterium. Whereas this allows screening of larger numbers of genes in a single clone and allows possible expression of more complex metabolic pathways, the probability of expression per gene is lower than for small-insert libraries with active and inducible promoters in closer proximity to environmental DNA. Genomic analysis of various bacteria suggests that ϳ40% of environmental genes can be heterologously expressed in E. coli without external promoters, but this can vary widely, depending upon the microbe from which the DNA is isolated (41). Fosmid libraries also come with higher production costs, not only through the DNA extraction process and phage-based delivery method, but also through the larger amounts of sequencing necessary to assemble active clones (42). The use of small-insert libraries largely removes the possibility of cloning of pathways and reduces the number of genes that are screened in a single clone. However, the probability of expression per gene is higher, costs are considerably lower, and the process is operationally simpler.
By screening with a simple, sensitive and reasonably chemically activated (MU pK a ϳ8) methylumbelliferyl GalNAc derivative, we were able to search for ␣-GalNAcases of all types, including enzymes with activity toward complex oligosaccharide and glycopeptide substrates. The substrate could be synthesized on a scale that allows screening of a reasonably large library (ϳ20,000 clones), something that would not be feasible with more complex substrates using our current plate-based screen.
Our discovery of a GH109 enzyme in this screen is consistent with previously observed substrate specificities (19,43). Four Table 2 Kinetic parameters for the hydrolysis of GalNAc-␣-pNP by truncated and full-length BcGH31 and BpGH31, truncated BpGH31 mutants, and BvGH109_1 Values indicate means Ϯ S.E. (n ϭ 3). Reactions were carried out in 50 mM HEPES, pH 7.0, at 37°C and observed via absorbance at 405 nm. Kinetic parameters were obtained by fitting initial velocities obtained from three replicate assays to the Michaelis-Menten equation using GraFit. Truncated BpGH31 mutants D413A/ G/S were inactive.

Characterization of a new peptide O-glycanase
members of this family have been investigated to date, but only one, that from Elizabethkingia meningosepticum (EmGH109), has been characterized in depth. Indeed, only EmGH109 showed a useful ability to cleave the A-antigen from red blood cells, thereby allowing the production of universal donor blood (21). GH109 from Bacteroides vulgatus also has exo-␣-GalNAcase activity, cleaving GalNAc-␣-pNP with a k cat /K m value around 6 times lower than that of EmGH109 (11).
The surprising finding was that the other two hits from the screen were identified as GH31 members, because this family was previously only known to harbor ␣-glucosidase, ␣-galactosidase, ␣-mannosidase, ␣-xylosidase, ␣-sulfoquinovosidase,

Characterization of a new peptide O-glycanase
and ␣-glucan lyase activities. Substrates for all of these enzymes have a 2-hydroxyl, whereas GalNAc has a bulky N-acetamide (36,37,44,45). As noted in the Introduction, ␣-GalNAcase activity has so far only been reported in the GH27, GH36, GH101, GH109, GH129, and GH135 families. Members of GH31 share similar structural characteristics with members of GH27 and GH36 and indeed have been assigned to the same Clan D of glycoside hydrolases on this basis; thus, crossover of substrate specificity is conceivable (46).
More insights are obtained by looking in greater detail at the phylogeny of the GH31 members. As shown in Fig. 9, BpGH31 and BcGH31 are both located in a cluster of the GH31 family that previously contained no characterized members. Both of these Bacteroides sp. GH31s share conserved aspartic acid residues that have been assigned in other family members as the catalytic nucleophile and general acid/base (Fig. 6). Kinetic analysis of mutants of the key catalytic residues showed a complete loss of detectable activity of the nucleophile mutants and large changes in the kinetic parameters of the acid/base mutants. The decreased k cat seen for all acid/base mutants and decreased K m of the tBpGH31 D465A/S mutants is consistent with slowed turnover of the covalent glycosyl-enzyme intermediate due to a lack of general base catalysis (47). The increased K m of the D465G mutant is perhaps due to fine changes in the active site resulting in a loss of affinity for the substrate. The kinetic behavior of these mutants along with the "retaining" stereochemical outcome (Fig. 5) and similar pH profiles all suggest that, despite their different substrate specificity, the enzymes from this clade use the same double-displacement catalytic mechanism as other characterized GH31 members (37,44,45,48,49).
The characterized members of GH31 that are closest in sequence to BcGH31 and BpGH31 are PsGH31A and PhGH31 from Pseudopedobacter saltans and heparinus (45). These GH31s were identified as ␣-galactosidases, with no activity on ␣-N-acetylgalactosaminides. Therefore, enzymes within these two clades contain elements of the same specificity determinants that allow binding of galacto-substrates with an axial 4-hydroxyl.
To determine whether the ␣-GalNAcase activity is conserved, we investigated the activity of another member of this clade. Grondin et al. (40) had carried out investigations on the CBM32 domains of what was noted as a potential ␣-glucosidase CpGH31. During our study, we noted that their enzyme is located in the same cluster as BpGH31 and BcGH31 and that their investigations revealed GalNAc as the best CBM ligand. However, the hydrolytic domain of CpGH31 had never been characterized. To determine whether CpGH31 is an ␣-GalNAcase, and that this activity is conserved within the clade, we generated a truncated version of CpGH31, containing the GH31 domain (Fig. 4). This protein, tCpGH31, was then assayed and indeed found to be a specific ␣-GalNAcase (Table 1 and Fig. S4). Thus, BpGH31, BcGH31, and CpGH31 constitute a new ␣-GalNAcase cluster within the GH31 family. Notably, this cluster does not contain the KiDEvD catalytic nucleophile sequence motif that was noted by Miyazaki et al. (45) as being present in the Gal31 subfamily (Figs. 6 and 8). The absence of this motif is thus consistent with the assignment of these newly discovered enzymes to a novel GH31 subfamily. Sequence alignment of the putative Gal-NAc31 subfamily suggests a potential motif of kTDVAWV-GaGYSFgL. This motif contains residues that would likely interact with the 2-acetamide, based on comparison with the YicI crystal structure (37). It should be noted that an extended branch within the putative GalNAc31 subfamily (shown in light turquoise in Fig. 9) lacks this motif despite having likely descended from an ␣-GalNAcase ancestor.

Characterization of a new peptide O-glycanase
Both tBpGH31 and tBcGH31 contained the putative GH domain plus an unknown N-terminal domain (DUF 5110) and a C-terminal fibronectin type III (FN3) domain. However, these genes were missing C-terminal regions of the putative full genes in the screened library plasmids (Fig. 4). These missing portions include an F5/F8 Type C domain, as well as a dockerin and cohesin domains. As noted by Briliūtė et al. (50), F5/F8 Type C domains are similar in structure to family 32 CBMs and could play a role in glycan binding. Dockerin and cohesin domains are generally involved in the assembly of multiprotein complexes on the cell surface in the degradation of carbohydrates (51). These missing domains were not necessary for activity, as evidenced by the only minor differences seen in kinetic parameters ( Table 2) and pH optima (Fig. S1) for cleavage of GalNAc-␣-pNP by full-length and truncated forms. Both truncated and full-length forms are active on fetuin, provided that the initial ␣-GalNAc is accessible (Fig. 7 and Figs. S3 and S4), but the truncated GH31s proceed more slowly than the full-length (Fig.  S5). Given their structural similarity to CBM32s, it seems possible then that these F5/F8 Type C domains are responsible for interactions with the target glycan and surrounding environment in the targeting of the enzyme to the glycoprotein, as shown previously for other CBMs (2). Further studies could show potential roles of the dockerin and cohesin domains in the interaction with other enzymes for polysaccharide degradation as has been observed in cellulosome assembly (51).
The context in which the enzymes BpGH31 and BcGH31 naturally function is unknown at this stage. Because the epithelial cells that line the entire intestinal tract (52) are coated in heavily O-glycosylated (O-GalNAc) mucins, the gut is a very O-GalNAc-rich environment for microorganisms and a likely target. Based on the inactivity of our GH31 enzymes on A-type blood antigen substrates, yet efficient release of the core ␣-Gal-NAc from O-glycoproteins such as fetuin after enzymatic Figure 9. Phylogeny of GH31 family. All available GH31 members from the CAZy database are presented in a phylogeny. Red dots, locations of sequences of characterized GH31 enzymes. Black dot, location of the BcGH31, BpGH31, and CpGH31 within the phylogeny. Colored areas, major enzymatic activities within the different clades of the GH31 family. Lighter-colored turquoise, putative ␣-GalNAcases that lack the consensus motif present in the GH31s characterized in this study.

Characterization of a new peptide O-glycanase
removal of other sugar residues, it is clear that it is the core GalNAc that is cleaved and not the terminal one. This was confirmed through the activity of the GH31s tested toward a synthetic glycopeptide containing a GalNAc-␣-1-Thr residue (Fig. 8).
The presence of signal peptides on both BpGH31 and BcGH31 suggests that they are secreted into the gut lumen as part of a set of O-glycan-degrading enzymes dedicated to nutritional scavenging. Analysis of the genes surrounding BcGH31 within the sequenced genome of B. caccae ATCC 43185 supports this hypothesis. The close proximity of SusD and SusC/RagA proteins indicates the presence of a potential polysaccharide utilization locus (PUL) (Fig. 10). PULs are sets of physically linked genes organized around a susCD gene pair that are prevalent in the Bacteroidetes phylum and are responsible for the degradation of large carbohydrate chains (53,54).
PULs often encode all of the genes necessary for carbohydrate degradation and uptake by bacteria. Generally, sugars are released from the polysaccharide substrate via the action of CAZymes present within the PUL, with subsequent binding and uptake being carried out by the SusC/D proteins and further carbohydrate degradation occurring after uptake (54). Upstream of the BcGH31 gene, there appears to be GH2 and GH42 ␤-galactosidases and one GH29 ␣-fucosidase (Fig. 10). These enzymes could conceivably act in concert to remove other residues present in the O-GalNAc chains to expose ␣-GalNAc at the glycoprotein backbone for cleavage by BcGH31.
Other genes of interest in the BcGH31 PUL include an M60 peptidase and a GH88. Enzymes from the M60 peptidase family seem to specialize in the degradation of glycoproteins (55). Notably, the most closely related characterized M60 peptidase (28.1% sequence identity) to that within our putative PUL in BcGH31 is a glycoprotein-specific peptidase (BT4244) which requires the T N -antigen to be present for activity (55). It should be noted that on certain substrates at least, the GH31s described in this study are capable of acting on intact glycoproteins and do not require the activity of a glycopeptidase (Fig. 7  and Fig. S4). The putative GH88 neighboring BcGH31 may play a role in degrading glycosaminoglycans present on the intestinal mucosal lining by cleaving at the unsaturated glucuronic acid units formed by polysaccharide lyases (56,57). Analysis of this gene set in the CAZy Polysaccharide Utilization Loci Database (PULDB) reveals (54) similarity to other predicted PULS from Bacteroides sp. 2_2_4, Bacteroides sp. D20, and Bacteroides uniformis ATCC 8492 containing a combination of GH31, GH29, and GH2 enzymes.
The region surrounding BpGH31 in the genome of B. plebeius DSM 17135 was less informative and does not seem to form a PUL. However, an M23 peptidase is located upstream of the BpGH31 gene. The Gly-Gly endopeptidase (58) encoded may well cleave the peptide into smaller fragments after removal of the O-GalNAc chain.
The putative PULs in which the described ␣-GalNAcase GH31s are located had no similarity to known mucin-degrading PULs, so more work will be required to conclusively determine the specific roles of BpGH31 and BcGH31 in a coordinated glycan-scavenging strategy. Despite the well-known adaptations of certain Bacteroides spp. for degradation of mucosal glycans (59), they lack the GH101 or GH129 family enzymes that are used by other gut bacteria in host glycan degradation (60). Thus, the GH31s discovered in this study, with homologues found in PULs within other Bacteroides, highlight a potential means for more complete degradation of mucosal glycans by Bacteroides spp.

Conclusion
In conclusion, two small-insert (Ͻ10-kb) metagenomic libraries derived from the gut microbiomes (fecal material) of human volunteers with blood type O and A were successfully screened for ␣-GalNAcase activity. Three bacterial ␣-GalNAcases were identified in the screen, including two novel GH31 enzymes and a GH109. BpGH31 and BcGH31 were isolated, characterized, and shown to be the first GH31 enzymes with ␣-GalNAcase activity (EC 3.2.1.49), thus forming the new GH31 ␣-GalNAcase clade. These enzymes were shown to specifically cleave the core ␣-GalNAc from mucin-type O-glycans. The discovery of this enzymatic activity highlights a new means for host glycan utilization by certain gut microorganisms. However, further experiments will be required to determine the specific role of these enzymes in the concerted degradation of host glycans by gut microbiota.

Materials and reagents
Substrates used in this study were purchased from Sigma-Aldrich, if not otherwise indicated. TLC was performed using TLC Silica Gel 60 F 254 TLC plates (EMD Millipore Corp., Billerica, MA).

Sampling and metagenomic DNA extraction
The collection of human fecal samples was approved by the Clinical Research Ethics Board of the University of British

Characterization of a new peptide O-glycanase
Columbia (ID: H15-02967). Human fecal samples were collected in a small sterile plastic receptacle, from which ϳ3-g aliquots were transferred to individual 50-ml Falcon tubes (ϫ6) and either stored at Ϫ70°C or immediately subjected to DNA extraction. DNA extraction was performed using a modified chemical lysis procedure. For each 3-g sample, 1.5 ml of denaturing buffer (10 mM Tris/HCl, pH 7.0, 4 M guanidinium isothiocyanate, 1 mM EDTA, 1% ␤-mercaptoethanol) and 9 ml of extraction buffer (100 mM sodium phosphate at pH 7.0, 100 mM Tris/HCl at pH 7.0, 100 mM EDTA at pH 8.0, 1.5 M NaCl, 1% CTAB, 2% SDS) was added to the sample. Two small glass beads (3 mm) were added to each sample tube to aid in homogenization, and then each sample was mixed vigorously by vortexing. Samples were incubated at 60°C for 40 min with light rotation. Following incubation, each sample was spun down (1800 rpm, 10 min), and the supernatant was transferred to a new 50-ml Falcon tube. A second extraction step was then performed by adding 5 ml of extraction buffer to the sample, homogenizing, incubating (60°C, 30 min), spinning the sample down, and transferring the additional supernatant to the new tube. An equivalent volume of ice-cold chloroform was added to the supernatant, and the tube was shaken lightly for 10 min. The sample was then centrifuged (1800 rpm, 10 min) and the clear, aqueous layer was transferred to 2-ml ultracentrifuge tubes in aliquots of 1 ml. Isopropyl alcohol (0.6 eq to sample volume) and ammonium acetate ( 1 ⁄ 10 total volume of 3 M stock) were added to ultracentrifuge tubes to a final concentration of 0.3 M (pH 5.2). Tubes were mixed and then centrifuged (15,000 rpm, 20 min, 4°C) and decanted (carefully so as not to disturb the DNA pellet). Each sample was washed with 1 ml of fresh 70% ethanol and then spun down and decanted. Each pellet was allowed to air-dry for 10 min prior to the addition of 150 l of resuspension buffer (TE buffer, pH 8.0). DNA pellets were resuspended overnight at 4°C, and then DNA concentration and yields were determined by absorbance at 260 nm. Approximately 200 g of DNA was recovered from each 3-g sample.

Library construction
Purified metagenomic DNA (1 g) was partially digested with the restriction enzyme Sau3AI (0.5 units) at 37°C for 30 min. The resulting DNA was separated on a 1% agarose gel by electrophoresis (100 V, 40 min). DNA of the size 3-10 kb was excised from the gel and purified using the Qiagen DNA Gel Extraction Kit. This insert DNA was then ligated (T4 DNA ligase) with the vector pHSG396, which had been digested previously with BamHI and dephosphorylated with the alkaline phosphatase FastAP (Thermo Fisher Scientific). The insert DNA (3-10 kb) was ligated with the vector (pHSG396) at a vector/insert ratio of ϳ1:3. The resulting ligation product was purified using the Qiagen PCR Purification Kit, concentrated by eluting (EB Buffer) into a smaller volume (20 l), and stored at Ϫ20°C prior to transformation. An aliquot of the ligation product (5 l) was transformed into ReplicatorFOS cells (80 l) by electroporation (1-mm cuvette, output voltage ϭ 1.6 kV, capacitance ϭ 25 microfarads, pulse time ϭ 4 -5 ms) and immediately recovered in SOC medium (1 ml) at 37°C for 60 min. A very small amount (2 l) of the cells was used to plate 1ϫ, 100ϫ, and 10,000ϫ dilutions on LB (12.5 g/ml chloram-phenicol) and determine library titers. DMSO ( 1 ⁄ 10 volume) was added to the library, and the library was stored at Ϫ80°C. Dilutions of the libraries were plated and grown on solid LB (chloramphenicol 25 g/ml) overnight at 37°C such that each plate contained 150 -400 colonies. Colonies were picked using the QPix II (Genetix) colony picker and transferred to 384-well plates containing LB (chloramphenicol 25 g/ml ϩ 10% glycerol (v/v)) in each well (80 l). The plates were incubated at 37°C for 20 h and then stored at Ϫ70°C.

Plate screening
Library plates were thawed at room temperature and then replicated to 384-well plates containing growth/induction medium (LB ϩ 25 g/ml chloramphenicol ϩ 0.2 mM isopropyl 1-thio-␤-D-galactopyranoside) in each well (40 l). Plates were incubated at 37°C for 20 h (in a humid chamber to minimize evaporation). Following growth/induction, 40 l of a lysis/assay mixture (50 mM NaPO 4 , pH 7.5, 1% Triton X-100, 200 M Gal-NAc-␣-MU) was added to each well, and the plate was incubated at 37°C. All plates were filled with medium or lysis/assay mixture using a QFill instrument (Genetix). Following 22 h of incubation, the fluorescence (excitation, 365; emission, 440 nm) of each plate was determined by a plate reader (Opticon). All data were compiled and analyzed in Microsoft Excel. Hits were identified as those wells fluorescing at least 10 S.D. values above the mean. Wells in the library storage plates corresponding to hits (Ͼ10 S.D. above mean) were picked and retested in duplicate on 384-well plates to validate ␣-N-acetyl-galactosaminidase activity.

Hit identification, cloning, and gene extension
DNA from each hit was isolated (Qiagen DNA Miniprep Kit) and sequenced (Sanger) using primers flanking each metagenomic insert (M13F and M13R). For inserts over 1 kb in length, primer walking was conducted until the entire insert sequence had been identified. Sequences were analyzed using the default setting of the ORFfinder program (NCBI) to identify protein-coding regions. ORFs were analyzed using BLAST (NCBI) to identify any sequence similarity to known proteins (Table S1). All hits found in the sequenced plasmids were missing parts of the sequence, relative to the BLAST hits on NCBI.
For the first hit, A1, called BvGH109_1, the ORF was extended to include 57 amino acids missing from its C terminus relative to its full putative protein sequence. This was achieved by ordering the remaining DNA sequence (150 bp) as a codonoptimized DNA fragment (Genewiz, Geneblocks). Overlap-extension PCR was performed with the BvGH109_1 ORF fragment along with the additional fragment. The resulting BvGH109_1 sequence was amplified, digested using the XhoI and NdeI restriction enzymes, and then ligated into the expression vector pET16b.
The other hits, O1 and A2, called BcGH31 and BpGH31, were both missing 342 and 337 amino acids, respectively, here called tBcGH31 and tBpGH31. They were cloned into pET28 via Golden Gate cloning (61). The full length was archived by ordering the missing C-terminal parts as Gene Strings (Thermo Fisher Scientific) and using PIPE cloning (62) for extension of pET28a-tBcGH31 and pET28a-tBpGH31 to the full-length con-

Characterization of a new peptide O-glycanase
structs. tCpGH31 was cloned out of the genomic DNA of C. perfringens DSM 798, generating pET28a-tCpGH31.
All proteins contained signal peptides that were identified by SignalP (34) and were not included in the final constructs. All primers used are noted in Table S3.

Site-directed mutagenesis
All single aa substitution mutants were generated using the QuikChange protocol for site-directed mutagenesis (63) and the primers noted in Table S3.

Substrate specificity assay
All enzymes were initially assayed against a number of substrates in a 384-well plate (clear) (NUNC). Each well contained 50 l of an assay mixture consisting of 50 mM NaH 2 PO 4 , pH 7.5 (40 l); 0.1 mM (final concentration) of either GalNAc-␣-MU, GalNAc-␣-pNP, Gal-␣-MU, Glc-␣-MU, Man-␣-MU, Xyl-␣-pNP, A-antigen Type2tetra-MU; and 20 g of purified candidate enzyme. Upon the addition of enzyme to each well, the plate was incubated at 37°C and monitored continuously at 405 nm by a plate reader (Biotek Synergy) for release of oNP or pNP. For GalNAc-a-MU, the fluorescence signal (excitation, 365; emission, 435 nm) resulting from MU release by hydrolysis was monitored using a Synergy H1 plate reader (BioTek) for at least 30 min. For A-antigen Type2 tetra -MU, a coupled assay was employed as described previously (21).

Kinetic characterization
Michaelis-Menten kinetic parameters were determined for the substrate GalNAc-␣-pNP for all enzymes. Kinetic assays were carried out at 37°C in 50 mM HEPES, pH 7.0 (total volume of 100 l). These were performed in multiwell plates (96-well plate (NUNC)) monitoring the increased absorbance (405 nm) or fluorescence (excitation, 360 nm; emission, 450 nm) resulting from pNP or MU release, respectively, using a plate reader (Biotek Synergy). Reactions were performed in triplicate, with varying concentrations of substrate and enzymes (36.2-45.8 nM tBpGH31, 49.9 nM BpGH31, 16.3-22.1 nM tBcGH31, 38.5 nM BcGH31, 54.6 -109 nM BvGH109, 863 nM tBpGH31 D465A, 400 nM tBpGH31 D465G, and 532 nM tBpGH31 D465S). Initial rates (milli-optical density ϫ s Ϫ1 or milli-relative fluorescence units ϫ s Ϫ1 ) were determined within the plate reader Synergy software and converted from absorbance to concentration (mM ϫ s Ϫ1 ) using pNP or MU standard concentration curves determined under identical reaction conditions. All measurements were made within the linear range of the standard curves. Kinetic parameters were then determined via fitting to the Michaelis-Menten equation using nonlinear regression in GraFit version 7.0.

pH optima of GH31 enzymes
pH optima were determined using the substrate depletion method in a manner similar to that described previously (65). In brief, at low concentrations of substrate, where [substrate] Ͻ K m , the k cat /K m value can be approximated upon nonlinear fitting of the reaction time course to a first-order curve. Enzymes were brought to appropriate concentrations in 5 mM NaH 2 PO 4 , pH 7.0, before 8-fold dilution into the assay mixture (10 M GalNAc-␣-pNP, 50 mM citrate, 50 mM NaH 2 PO 4 , and 50 mM glycine at pH values from 5 to 8) for up to 25 min. Reaction progress was followed by the absorbance at 348 nm (pH 5-6) and 405 nm (pH 6.5-8). Control experiments, in which enzymes were incubated in the appropriate buffers (pH 4 -8) before 15-fold dilution and assaying in 200 M GalNAc-␣-pNP, 50 mM citrate, 50 mM NaH 2 PO 4 , and 50 mM glycine, pH 7.0, were performed to determine conditions under which the enzymes did not lose substantial activity (Ͼ15%) due to denaturation over the course of 25 min. Data on activity versus pH were then analyzed using GraFit version 7.0 to determine pK a values and pH optima.

High-performance anion-exchange chromatography coupled with pulsed amperometric detection (HPAEC-PAD)
The ability of the candidate enzymes to digest biological substrates was determined through digestion tests on native and denatured fetuin. In brief, fetuin from fetal calf serum was denatured and reduced over a period of 3 h at 37°C (18.9 mM DTT, 6 M guanidinium hydrochloride, 9.4 mM EDTA, 94 mM Tris/ HCl, pH 8.2). Reduced Cys residues were then alkylated through the addition of iodoacetamide (final concentration of 23 mM) for 90 min at 37°C while protected from light. The buffer was then exchanged to 20 mM HEPES, pH 7.0, 150 mM

Characterization of a new peptide O-glycanase
NaCl, 10 mM CaCl 2 using the Amicon Ultra-15 centrifugal filter with a 10-kDa molecular weight cut-off (Merck Millipore). The fetuin substrate (3 mg/ml) was then incubated at 37°C with different combinations of sialidase NedA (0.1 mg/ml), ␤-1,3galactosidase BgaC (0.2 mg/ml), and purified candidate enzyme (0.1 mg/ml). To stop the reactions and remove proteins prior to HPLC analysis, the samples were applied to a prewashed Amicon Ultra-0.5 centrifugal filter with a 10-kDa molecular weight cut-off (Merck Millipore). The permeates were then analyzed via HPAEC-PAD. Assays using native fetuin proceeded in a similar manner, albeit without the denaturation steps.
The ability of the enzymes tBpGH31, BpGH31, tBcGH31, and BcGH31 to cleave GalNAc-␣-O-threonine on a peptide substrate was determined using a glycopeptide substrate containing an N-terminal BODIPY residue (BODIPY-QPQP(Gal-NAc-␣-)TAGPV). 20 g of glycopeptide substrate and ϳ50 ng of purified enzyme were incubated overnight at room temperature in 50 mM HEPES, pH 7.0. 10 l was diluted in H 2 O (100 l) and submitted to analysis on a Dionex HPAEC-PAD instrument.
Separation was obtained on a CarboPAC PA200 (150-mm) column with guard column, and detection was achieved using a disposable gold on polytetrafluoroethylene electrode and a four-potential waveform. The separation conditions were as follows: 100 mM sodium hydroxide and a sodium acetate gradient from 70 to 300 mM over the first 10 min of the separation. The eluent was held at the final gradient conditions for 1 min and then returned to the starting conditions over the next minute. The flow rate was 1.0 ml/min, and an injection was made every 27 min. A standard of the free sugars GalNAc, Gal, and Neu5Ac (50 M) was also applied to HPAEC-PAD to determine the peak elution times for reference.

Far-UV CD spectroscopy for secondary structure prediction
Enzymes were diluted into 5 mM NaH 2 PO 4 , pH 7, to a final concentration of 0.075 mg/ml. Data were acquired in a Jasco J-815 CD spectrophotometer using a 1-mm quartz cuvette (VWR) with the following parameters: 180 -250-nm measuring range, speed 50 nm/min, pitch 0.5, scans 20, room temperature. Raw data were input into a web server that analyzed the data using the method described by Raussens et al. (66) for secondary structure prediction.

Stereochemistry of GalNAc cleavage
␣-N-Acetylgalactosaminidase reactions (500 l) were set up in NMR tubes (Sigma) prior to analysis. Each 500-l reaction contained 50 mM HEPES buffer (pH 7.5), GalNAc-␣-DNP (5 mM final concentration), and either 100 or 8 g of purified tBpGH31 and tBcGH31, respectively. All buffer salts and substrates were lyophilized and resuspended in D 2 O. The enzyme was buffer-exchanged into D 2 O prior to addition to the reaction mixture. Upon enzyme addition and mixing, the reactioncontaining NMR tube was immediately submitted to 1 H NMR analysis for monitoring. 1 H NMR spectra were recorded following 5 min of reaction (D 2 O, 298 K, 400-MHz Bruker INV) to determine stereochemistry of the initial product formed (prior to formation of both ␣and ␤-products at equilibrium).

GH31 phylogenetic mapping
Reference sequences of GH31 were downloaded from the CAZy database using SACCHARIS' cazy_extract.pl script (67). A phylogenetic-based protein-profiling software, TreeSAPP (available on GitHub), was used to both build the reference trees and map the sequences to these trees. Briefly, HMMs from dbCAN were used to extract protein family domains from all full-length sequences downloaded from CAZy (68). These sequences were then clustered at 70% sequence similarity using UCLUST to remove redundant sequence space and decrease the size of the tree (69). RAxML version 8.2.0 was used to build the reference trees with the "--autoMRE" to decide when to quit bootstrapping before 1000 replicates have been performed and PROTGAMMAAUTO to select the optimal protein model (70,71). TreeSAPP was then used to map the query sequences onto these reference trees. Briefly, protein sequences were aligned to HMMs using hmmsearch, and the aligned regions were extracted (72). hmmalign was used to include the new query sequences in the reference multiple alignment, and then TrimAl removed the unconserved positions from the alignment file (73). RAxML was used to classify the query sequences in the reference tree through insertions. Placements of each query sequence were filtered and concatenated into a single .Jplace file before being visualized in iTOL (74,75).

Ethics
The collection of human fecal samples was approved by the Clinical Research Ethics Board of the University of British Columbia (ID: H15-02967).

Data availability
The data sets generated and/or analyzed during this study are available from the corresponding author on reasonable request.