An evolutionarily distinct family of polysaccharide lyases removes rhamnose capping of complex arabinogalactan proteins

The human gut microbiota utilizes complex carbohydrates as major nutrients. The requirement for efficient glycan degrading systems exerts a major selection pressure on this microbial community. Thus, we propose that this microbial ecosystem represents a substantial resource for discovering novel carbohydrate active enzymes. To test this hypothesis we screened the potential enzymatic functions of hypothetical proteins encoded by genes of Bacteroides thetaiotaomicron that were up-regulated by arabinogalactan proteins or AGPs. Although AGPs are ubiquitous in plants, there is a paucity of information on their detailed structure, the function of these glycans in planta, and the mechanisms by which they are depolymerized in microbial ecosystems. Here we have discovered a new polysaccharide lyase family that is specific for the l-rhamnose-α1,4-d-glucuronic acid linkage that caps the side chains of complex AGPs. The reaction product generated by the lyase, Δ4,5-unsaturated uronic acid, is removed from AGP by a glycoside hydrolase located in family GH105, producing the final product 4-deoxy-β-l-threo-hex-4-enepyranosyl-uronic acid. The crystal structure of a member of the novel lyase family revealed a catalytic domain that displays an (α/α)6 barrel-fold. In the center of the barrel is a deep pocket, which, based on mutagenesis data and amino acid conservation, comprises the active site of the lyase. A tyrosine is the proposed catalytic base in the β-elimination reaction. This study illustrates how highly complex glycans can be used as a scaffold to discover new enzyme families within microbial ecosystems where carbohydrate metabolism is a major evolutionary driver.

The human gut microbiota (HGM) 3 makes an important contribution to the health and physiology of its host (1). The major nutrients available to this microbial ecosystem are complex host and dietary glycans (2). To exploit these carbohydrate polymers as nutrients, members of the HGM, particularly bacteria from the Bacteroidetes phylum, have evolved extensive glycan degrading systems comprising enzymes, carbohydratebinding proteins, transporters, and regulators (3)(4)(5)(6). The genes encoding these proteins are physically linked within loci referred to as PULs (polysaccharide utilization loci) and transcriptionally up-regulated by specific components of the target glycan (2,7,8). Thus, the HGM represents a valuable microbial resource for discovering carbohydrate active enzymes (CAZymes), particularly glycoside hydrolases (GHs) and polysaccharide lyases (PLs), which cleave the glycosidic bonds that link the sugars in polysaccharides and oligosaccharides. CAZymes have been grouped into sequence-based families in the continuously updated CAZy database (9). The protein-fold, catalytic apparatus, and mechanism are generally conserved in each family (9), although substrate specificity can vary (10).
Arabinogalactan proteins (AGPs) are a structurally diverse but poorly characterized group of plant glycans. These molecules are highly glycosylated members of the hydroxyprolinerich glycoprotein superfamily of plant cell wall proteins, with 90% of their total mass being glycans. The core structure of the glycan component of AGPs comprises a backbone of ␤1,3-galactan that is substituted at O-6 with ␤1,6-galactooligosaccharides. These side chains can then be further decorated with arabinose, rhamnose, (methyl) D-glucuronic acid (GlcA), and less frequently, with fucose or xylose moieties (11)(12)(13). These additional decorations vary with respect to the plant species and cell type. Some AGPs, exemplified by the glycan from Acacia senegal (also known as gum arabic and defined henceforth as GA), are used extensively in the food industry as stabilizers and emulsifiers (e.g. in soft drink syrups, marshmallows, or gummy candies) (14). The common use of AGPs as food additives, and their widespread presence in plant species, explain why these glycans are common components of the human diet. Consistent with their exposure to AGPs, some members of the HGM utilize these complex carbohydrates as growth substrates. Indeed, Bacteroides thetaiotaomicron and Bacteroides ovatus, two prominent members of the HGM, were previously shown to grow on the simple AGP from larchwood, and the PULs activated by this glycan were identified (2). Dissecting the mechanisms by which AGPs are degraded will make an important contribution to our understanding of glycan utilization in the HGM. Such knowledge has the potential to underpin dietary probiotic-and prebiotic-based strategies to maximize the impact of this ecosystem on human health. Furthermore, enzymes that cleave specific linkages in AGPs will be invaluable in determining the structure of these complex but heterogeneous carbohydrates, which has the potential to deliver new products for the food industry.
␣-L-Rhamnose (Rha) caps the side chains in several complex AGPs that are components of the human diet. There terminal Rha units create a significant enzymatic challenge for GHs (␣-Lrhamnosidases) as unfavorable syn di-axial interactions, caused by the axial O-2, must be overcome during glycosidic bond cleavage (15,16). As some AGPs, exemplified by GA, contain Rha units linked ␣1,4 to uronic acids (UAs) such as GlcA, it is possible that PLs also contribute to the removal of these 6-deoxy-sugars from the plant glycan. PLs perform glycosidic bond cleavage through a ␤-elimination reaction by acting on GlcA at the ϩ1 subsite (the scissle glycosidic bond is between sugars bound at the Ϫ1 and ϩ1 subsites of CAZymes (17)). In this instance PLs would cleave the Rha-␣1,4-GlcA linkage without performing difficult rhamnosidase chemistry (see Ref. 18 for a review of the PL requirement for substrates containing UAs).
Here we have tested the hypothesis that the HGM provides a reservoir of new enzymes capable of cleaving ␣-L-rhamnosidic linkages. We show that PLs discovered in Bacteroides species within the HGM releases Rha from AGPs. We also demonstrate that the reaction product of the lyase, 4-deoxy-␤-L-threo-hex-4-enepyranosyl-uronic acid (⌬4,5-GlcA)-linked ␤1,6 to D-galactose (Gal), is cleaved by a GH105-unsaturated glucuronidase. The crystal structures of these two enzymes provide insights into the catalytic apparatus and the structural basis for their substrate specificity.

BT0263 releases L-rhamnose from AGP
B. thetaiotaomicron, a common member of the human gut microbiota, is capable of growing on the simple AGP from larchwood (2). The glycan up-regulates two PULs (BT0262-BT0290 and BT3674-BT3687) defined as AGP-PUL1 and AGP-PUL2, respectively. According to the prototypic PUL paradigm, this suggests that the proteins encoded by these loci contribute to the utilization of AGPs. As stated above the structure of AGPs are highly variable, which likely explains why these two PULs encode a large number of proteins. To explore the functional significance of AGP-PUL1 and AGP-PUL2, the proteins encoded by these loci were produced in recombinant form and their biochemical properties evaluated. Here we have investigated the role of BT0263 in the depolymerization of AGPs. The protein comprises 470 amino acids and has no sequence simi-larity with any known CAZyme or members of the CAZy database. The capacity of BT0263 to act on AGPs from larchwood, wheat, and GA was investigated. BT0263 only released Rha from GA indicating that the protein is an exo-acting enzyme (Fig. 1A, Table 1). BT0263 had a pH optimum of ϳ5 (Fig. 1B) and experiments performed with BACCELL_00875 demonstrated the new family is likely not to be metal dependent ( Table 2).

BT0263 is a lyase that cleaves Rha-␣1,4-GlcA linkages
Terminal Rha units in AGPs are reported to be linked ␣1,2, ␣1,4, or ␣1,6 to Gal and ␣1,4 to GlcA (12,19). Recently BT3686 was shown to be an ␣-L-rhamnosidase specifically targeting Rha-␣1,4-GlcA linkages in the side chains of GA (20). BT0263 was not active against GA treated with BT3686 indicating that both enzymes cleaved Rha-␣1,4-GlcA linkages. To explore the origin of the Rha generated by BT0263, GA was treated with BT0265, which belongs to GH43 subfamily 24 that is populated exclusively with exo-␤1,3-galactanases (21). The proposed exo-␤1,3-galactanase activity of BT0265 is consistent with its capacity to generate oligosaccharide side chains from GA (Fig. 1C), which are attached to galactose residues released from the AGP ␤1,3-galactan backbone; a signature product profile for such enzymes (22,23). When these oligosaccharide side chains were incubated with BT0263, only Rha was produced confirming its target substrates are the glycan decorations in AGPs and not Rha linked directly to the backbone (Fig. 1C). Significantly, the products generated by BT0263 from the BT0265-derived oligosaccharides showed absorbance in the near UV range at 235 nm, indicative of the formation of a carbon-carbon double bond conjugated to a carboxylate (Fig. 1D). Finally, mass spectrometry showed that the release of Rha from a GA-derived heptasaccharide, again generated through the action of BT0265, resulted in a decrease in mass of 164 consistent with the cleavage of the O1-C4 bond of the Rha-GlcA linkage and thus supporting the PL function of BT0263 (Fig. 2, A and B). The cleavage of glycosidic bonds by PLs is through a ␤-elimination mechanism. Briefly, this mechanism operates by a general base abstracting a proton from C5 of UA sugars, as this hydrogen is particularly acidic due to the presence of the carboxylic acid group also on C5. This results in an enolate transition state that collapses resulting in the formation of a conjugated double bond between the C4 and C5 and elimination of the glycosidic oxygen from C4, which is concomitantly protonated (24,25). This results in the formation of unsaturated UA (⌬4,5-UA) products, which have the characteristic UV signal at A 235 nm observed here (26).
To further confirm the activity of BT0263 as a new PL, we recombinantly expressed BT3687, encoded by AGP-PUL2, as the enzyme is a member of family GH105 in the CAZy database (9). All members of GH105 characterized to date act exclusively on ⌬4,5-UA-O-R (R is generally a sugar) generated by PLs that act on rhamnogalacturonan-I and ulvan. Briefly, this family operates through a mechanism where a general acid/base protonates the double bond, resulting in a hemiketal intermediate, which proceeds through either a glycosyl-enzyme, or an epoxide, intermediate causing glycosidic bond cleavage and loss of the C4 -C5 double bond (27,28). The activity of these enzymes can therefore be monitored by loss of the UV signal at 235 nm. When the products of oligosaccharides generated by BT0265 and then the lyase BT0263 were incubated with BT3687 the UV signal was lost (Fig. 1D). Furthermore, the reduction in mass of the BT0263-treated oligosaccharide after incubation with BT3687 was 158, which is entirely consistent with the loss of ⌬4,5-UA with bond cleavage occurring between C1 and O4 of the scissile linkage (Fig. 2, B and C). These data confirm that BT0263 is a new PL cleaving the O4 -C4 bond between Rha-␣1,4-GlcA generating a Rha and ⌬4,5-UA linked to an oligosaccharide. Thus, BT0263 is the founding member of a new PL family defined here as PL27.
Compared with the vast majority of PL27 members, BT0263 appears to be truncated lacking the N-terminal domain of ϳ200 residues (see the phylogeny section). This may indicate that the ; GA treated with BT0265 and then BT0263 (green); GA treated sequentially with BT0265, BT0263, and the GH105 enzyme BT3687 (black). C is measured by pulsed amperometric detection, whereas D is measured via UV A 235 nm . X indicates specific products, in addition to Rha, generated by BT0263 that are susceptible to BT3687, which specifically targets ⌬4,5-UA generated by polysaccharide lyases. The experimental conditions were as in A. a The oligosaccharide substrates were derived from GA through cleavage of the galactan backbone by BT0265, a ␤1,3-galatanase. Details on how these oligosaccharides were generated, purifed, and their DP determined is provided under "Experimental procedures." Table 2 Metal-dependent activity of BACCELL_00875 against GA The experimental conditions were 20 mM sodium phosphate buffer, pH 7.0, Bac-cell_00875 ϭ 0.5 M, GA ϭ 1 mM. The enzyme was assayed with no addition or after incubation with 50 mM EDTA, which was then removed by size exclusion chromatography. The lyase was then assayed in the absence (EDTA treated) or presence of the metals (1 mM) indicated. 600-residue protein (termed full-length proteins) in this family display a different specificity or catalytic function to BT0263.

Metal
To evaluate this possibility the biochemical properties of two PL27 full-length proteins, BACCELL_00875 and BACFIN_ 07013 derived from Bacteroides cellulosilyticus and Bacteroides finegoldii, respectively, were determined. Both proteins displayed the same activity as BT0263 (Table 1), showing that rhamno-glucurono lyase activity is a common feature of PL27. These data also indicate that the catalytic domain in PL27 enzymes comprises the 400-residue C-terminal domain. The potential role of the N-terminal domain is discussed below.

Crystal structure of members of PL27
To explore the structure-function relationships of enzymes in PL27, crystallization of BT0263 and the two homologs BACCELL_00875 and BACFIN_07013 was attempted. Only crystals of BACCELL_00875 could be generated. The structure of the enzyme was initially solved by single-wavelength anomalous dispersion using a selenomethionine (SeMet)-derivatized protein to a resolution of 2.2 Å. A higher resolution structure of the native enzyme was then solved to 1.7 Å by molecular replacement using the SeMet-BACCELL_00875 structure as the search model. The SeMet and native proteins were crystallized in space groups P4 1 2 1 2 and P3 2 2 1 , respectively, with both having two molecules in the asymmetric unit. The structure of BACCELL_00875 comprises three domains, two ␤-sandwich domains and a C-terminal (␣/␣) 6 barrel (Fig. 3A). The N-terminal domain, extending from Phe 27 to His 243 , displays a twisted ␤-sandwich-fold. This domain contains 13 antiparallel strands arranged in two ␤-sheets. The order of the strands in ␤-sheets 1 and 2 is ␤1-␤2-␤7-␤12-␤11 and ␤3-␤4-␤5-␤6-␤13-␤10-␤9-␤8, respectively. The second domain, extending from Ser 257 to Ser 336 , is a classical ␤-sandwich domain in which the seven antiparallel strands are arranged in the order ␤1-␤2-␤5-␤4 in ␤-sheet 1 and ␤7-␤6-␤3 in ␤-sheet 2. Some of the ␤-strands are interrupted and four small ␤-strands are not components of the two ␤-sandwich domains. A small ␣-helix stretching from Asn 245 to Gly 256 separates the ␤-sandwich domains and makes numerous apolar and polar contacts with ␤-sheet 2 of the N-terminal domain and a hydrophobic contact with Leu 270 of the second domain. The loop connecting ␤8 and ␤9 of the N-terminal domain makes several apolar interactions with ␤7 of the second sandwich domain. The C-terminal domain comprises residues Asn 337 to Leu 694 . This domain displays an (␣/␣) 6 barrel-fold where a central barrel of 6 ␣-helices (helices 2, 4, 6, 8, 10, and 12) are spiraled by a further 6 ␣-helices (helices 1, 3, 5, 7, 9, and 11). The central barrel forms the hydrophobic core of the protein. This domain shows the highest levels of conservation within the PL27 family and is where the active site is housed, see below. Residues in the extended loop between helixes 9 and 10 (Pro 561 , Ser 562 , Tyr 563 , and Leu 624 ) make apolar contacts with Tyr 243 , Val 238 , Val 244 , and Leu 311 in the central domain and abuts on the top and base of helices 1 and 12, respectively. The (␣/␣) 6 barrel also makes extensive aromaticmediated hydrophobic interactions with the N-terminal ␤-sandwich domain. Specifically, Phe 578 , Val 590 , Ile 592 , Phe 610 , Phe 581 , and Tyr 661 , which are on elongated loops that connect helices 9, 10, 11, and 12, make apolar interactions with His 164 , His 166 , Trp 168 , Tyr 175 , Ile 179 , and Tyr 200 in the N-terminal domain. A single salt bridge, between Arg 215 and Asp 608 , also stabilizes the inter-domain association. The tight association between the catalytic domain and the two ␤-sandwich domains indicates that the N-terminal region of the protein contributes to stabilization of the C-terminal (␣/␣) 6 barrel.

Active site of BACCELL_00875
Based on sequence conservation, the active site is likely located in the (␣/␣) 6 -barrel domain (Fig. 3, B and C). This hypothesis is supported by the observation that although BT0263 lacks the N-terminal domains the enzyme had catalytic activity similar to BACCELL_00875 (Table 1). Furthermore, the (␣/␣) 6 -barrel-fold is often associated with CAZymes, both GHs and PLs (9,29,30). A multiple sequence alignment of the members of PL27 show a significant number of invariant and highly conserved residues( Fig. 3D and supplemental Fig. S1). When mapped onto the crystal structure of BACCELL_00875, many of these amino acids are localized at the center of (␣/␣) 6 barrel, forming part of a deep pocket (Fig. 3C). It is interesting to note that the top of this deep pocket is formed by the closure of the loop between ␣-helices 7 and 8. The conformation of the loop is locked by a salt bridge between Arg 447 and Glu 537 and through aromatic amino acid interactions, with His 536 sitting in a hydrophobic pocket formed by Phe 489 , Tyr 490 , and Pro 534 . This loop differs between the two molecules in the asymmetric unit of both the SeMet structure, and the higher resolution native structure, undergoing a ϳ9 Å shift that results in His 536 and Glu 537 pointing into solution (Fig. 3C). These different conformations reflect the presence of MES and a single glycerol in the locked conformation versus no MES and two glycerol molecules in the open conformation (supplemental Fig. S2). Based on the sequence alignment and structural data, a targeted set of mutations was designed to elucidate residues that make a significant contribution to catalytic activity. These mutants were E537Q, R593A, D596A, D596N, W599A, H612A, and Y613F. The effect of all of these mutations was significant with E537Q, R593A, D596A, D596N, and H612A substitutions causing a ϳ4,000to ϳ10,000-fold reduction in k cat /K m , whereas the loss of Trp 613 at the top of the pocket resulted in a more dramatic 50,000-fold reduction ( Table 3). The only mutation that completely inactivated the enzyme was Y613F, suggesting that this residue, sitting deep in the pocket (Fig. 3, B and C), is likely a significant component of the catalytic apparatus.
A Dalilite search (31) was performed to find relatives to BACCELL_00875 that display a similar fold. This search returned a N-acyl-D-glucosamine 2-epimerase, 7% identity (PDB ID 1FP3), a cellobiose 2-epimerase, 10% identity (PDB ID 3WKF), and an ␣-1,6-D-mannanase, 10% identity, belonging to the GH76 family (PDB ID 4V1R). These homologs, however, had high root mean square deviations (r.m.s. deviations) Ͼ3.2 and Z-scores Ͻ26. The location of the active site in these enzymes, however, is similar to the position of the proposed catalytic center active of BACCELL_00875 but the catalytic residues do not overlay with Tyr 613 , and the ligands for these enzymes could not be satisfactorily accommodated in the deep pocket of the PL27 family. The closest characterized lyase homolog, a member of family PL15, displays 5% identity (PDB ID 3AFL (29)) and has r.m.s. deviations Ͼ4.0 and Z-score Ͻ11. Again, the location of the active site of the PL15 lyases and the deep pocket (the proposed catalytic center) of BACCELL_00875 are conserved. However, the catalytic residues of the PL15 enzyme, a histidine and tyrosine acting as the general base and acid, respectively, do not overlay with Tyr 613 in BACCELL_00875. These

New polysaccharide lyase active on AGPs
results suggest that the structurally related proteins may not be functionally or evolutionarily related to BACCELL_00875.

Crystal structure of BT3687
The unsaturated glucuronidase BT3687 was also crystallized and its structure solved by molecular placement using PDB code 4CE7 as the search model (32). The enzyme also has an (␣/␣) 6 -barrel-fold, as has been shown previously for other members of the GH105 family (33). The active site location within GH105 enzymes is conserved and allows identification of the putative catalytic residues of BT3687 as Asp 116 and Asp 160 (Fig. 4A), which overlay well with catalytic apparatus of other structurally characterized members (YteR, YteR2, and Nu_GH105 (28, 32)) of family GH105 (Fig. 4B). Mutation of Asp 116 and Asp 160 to Ala in BT3687 completely ablated catalytic activity in the B. thetaiotaomicron enzyme ( Table 3). The Ϫ1 active site pocket of BT3687 is lined with the hydrophobic amino acids Trp 56 , Trp 158 , Met 164 , Trp 225 , and Trp 231 , and appears to be a general feature of GH105 enzymes with YteR, YteR2, and Nu_GH105 all containing the equivalent residues (Fig. 4B). This may ensure protonation of the catalytic acid/base (Asp 160 ) at physiological pH. Arg 227 , which coordinates with the carboxylate of the substrate is also conserved. This active site conservation also largely extends to GH88 enzymes, the only other family to perform catalysis by protonation of the double bond of PL products.

Phylogeny of the new PL family
The spread of the new PL family PL27 was investigated through iterative BLASTP and HMM searches. The limited sequence diversity, even between prokaryote and eukaryote sequences, and the absence of any distantly related family in the CAZy database, facilitated the delineation of this family and the construction of a phylogenetic tree (Fig. 5). Members are present in several bacterial phyla, particularly the Bacteroidetes (ϳ50 sequences) where ϳ80% are in the Bacteroides genus. Interestingly, within a single species that contains the lyase, this family is not widely distributed in the different strains, a feature that is not shared with other CAZy families. Also noteworthy is the split of the homologs into two distinct proteins in B. thetaiotaomicron. In ϳ50% of the strains in this species the lyase lacks the N-terminal 200 residues comprising the ␤-sandwich domains (these lyases are labeled as FUSED in Fig. 5). For example, in B. thetaiotaomicron strain VPI-5482 the lyase is BT0263 and the ␤-sandwich protein is BT0262. The biological rationale for the presence or absence of the N-terminal domain in the lyases is unclear although this region of the enzyme is not required for catalytic activity (Table 1), and likely contributes to the stabilization of the enzyme. Members of PL27 are also commonly found in Actinobacteria, where at least four different bacterial orders contain the lyase (ϳ10 sequences in Fig. 5) and in lower numbers in Firmicutes (mainly in the Lachnospiracea family of the Clostridia class, ϳ10 sequences) and a single sequence from the Spirochaetes phylum. In the eukaryotes ϳ60 fungal sequences have been identified, all belonging to the filamentous ascomycete fungus Pezizomycotina subphylum. The absence of any known distantly related families suggests that the new family may have evolved from a progenitor sequence that displayed no evolutionary link to current PL and GH families.
Of the 126 sequences that currently compose this new family the proposed catalytic Tyr is conserved in 117 sequences, with six having a His and one a Phe. His is a common base used by PLs (29,34) and these variants could well be active, whereas the Phe variant probably represents a loss of function, exemplified by the BACELL_00875 mutant Y613F. It is interesting to note that in most species (ϳ120) a single copy of the PL gene is observed, whereas eight species have two copies. In these eight organisms there was either very little sequence divergence in the two copies of the two genes, suggesting the generation of these paralogs was a fairly recent event, In the fungi Clonostachys rosea and Colletotrichum fioriniae PJ7, however, high divergence of the paralog occurred, and in one version the catalytic Tyr was replaced with a His. Together, these observations support the hypothesis that only a single copy of the lyase has been necessary during the evolution of these organisms.

Discussion
This study provides the characterization of a new PL family that cleaves Rha-␣1,4-GlcA glycosidic bonds. The family contains a limited number of sequences, is present in fungal and bacterial organisms, and is well conserved across taxa. This may suggest that the family has specifically evolved to target AGPs containing the Rha-␣1,4-GlcA linkage and may be a relatively recent evolutionary event. Intriguingly, in a proportion of the Bacteroides species the N-terminal domain contains a stop codon at its C terminus resulting in two proteins in these organisms, a functional lyase and a protein containing two ␤-sandwich domains. This has the consequence, in these strains, of presumably localizing the lyases to the cytoplasm as they now lack a signal peptide, which is retained by the ␤-sandwich proteins. The loss of the N-terminal domain does not affect catalytic efficiency of the enzyme and this domain shows much more sequence diversity within the PL27 family than the C-terminal catalytic domain. The rationale for the loss of this domain in the lyase expressed by the B. thetaiotaomicron strains is unclear. Polysaccharide degradation occurs in extra-cytoplasmic locations with only monosaccharides being transported into the cytoplasm, so localization of BT0263 to the cytoplasm is counter-intuitive. It is formally possible, however, that BT0263 is secreted to the periplasm by an unidentified mechanism.
The catalytic mechanism of this new PL family likely proceeds through a classical lyase ␤-elimination mechanism, with the leaving group and the abstracted proton both being in the syn chemical space, and generating the signature ⌬4,5-UA and Rha as its products (Fig. 6). Tyr 613 was identified as being crucial to its function (the mutant Y613F is completely inactive, Table 3) and is highly conserved in the family. Tyr 613 is therefore the candidate catalytic base abstracting the C5 proton, generating the enolate anion intermediate that leads to elimination of the glycosidic oxygen and the production of ⌬4,5-UA. After cleavage the glycosidic oxygen, now being the O1 of Rha, requires protonation as the pK a values of secondary alcohols are Ͼ16. A catalytic acid is therefore required to protonate the glycosidic oxygen to facilitate leaving group departure. Apart from Tyr 613 , however, there are no candidate polar residues that could fulfill this function in the proximal region of the active site pocket. It would appear, therefore, that Tyr 613 likely functions as the catalytic acid-base. Indeed, there are a number of PL families, such as PL1, PL8, and PL9, where a single catalytic residue has been proposed to function as the acid/base (25, 34 -36). Usually, there is also an amino acid or metal ion that increases the negative charge of the carboxylic acid of the substrate. This helps to both further acidify the C5 proton, enhancing its ability to accept electrons from the base, and to stabilize the enolate anion at the transition state. Without a ligand complex it is hard to unambiguously assign a residue in BACCELL_00875 that interacts with the carboxylate of GlcA, however, Arg 447 and Arg 593 are potential candidates. These residues are invariant in the family and are ϳ7 Å from the catalytic tyrosine, a distance consistent with this proposed function. An additional feature of the PL is the importance of the loop between helices, which close to form the active site pocket. The mutant E537Q shows a marked decrease in catalytic competence presumably due to the loss of the ionic lock that is needed to secure the loop in place and form the active site. The residual activity of the mutant without the "ionic lock" likely reflects the stochastic conformation adopted by the loop, which may infrequently allow productive binding of substrate or facilitate departure of the cleaved products. Butyrivibrio

New polysaccharide lyase active on AGPs
The structure of BT3687 further demonstrated that the Ϫ1 subsite of this class of enzyme contains a residue that is a tyrosine or tryptophan in ␣and ␤-glycosidases, respectively. A tyrosine at this critical position creates the required space for ␣-configured sugars at the ϩ1 subsite. The phenol also makes a hydrogen bond with the ϩ1 sugar, which likely contributes to substrate binding. It is unusual for a GH family to contain enzymes that act on ␣and ␤-linked substrates, although this is also observed in the GH4 family (37), which also departs significantly from classical acid-base catalysis.

Conclusions
This study describes the discovery of a new PL family that targets Rha-␣1,4-GlcA and its GH105 enzyme partner, which cleaves the product of the lyase. The new PL family, of which BT0263, BACCELL_00875, and BACFIN_07013 are founding members, is highly conserved. The structure of BACCELL_ 00875 revealed that the closure of a dynamic loop forms the active site pocket and is secured in place by a salt bridge forming an ionic lock. The enzyme family displays a canonical ␤-elimination mechanism utilizing a single catalytic tyrosine. The active site of the GH105 enzyme, BT3687, displayed the expected characteristics of GH105 members, with a highly conserved Ϫ1 subsite and no obvious conservation of the ϩ1 subsite architecture.
AGPs are important polysaccharides in plant biology and in the food industry. Their structures, however, are very diverse. The novel activities and structural details described here makes an important contribution to the toolbox of biocatalysts available to dissect the structure of these complex glycans and to generate bespoke industrially relevant AGP-derived oligosaccharides particularly for the food industry.

Cloning, expression, and purification
BT0263 and BT3687 were amplified from B. thetaiotaomicron genomic DNA and cloned into pET28a with an N-terminal His 6 tag using NheI and XhoI restriction sites. To generate DNA encoding BACCELL_00875 and BACFIN_07013, the sequence of the protein was used as template for gene synthesis with codon optimization for Escherichia coli heterologous production (Biomatik, Cambridge, Canada) and was subsequently cloned into the pET28a vector. The genes were then expressed in E. coli BL21, or Tuner cells, transformed with the appropriate recombinant plasmids. The recombinant E. coli strains were cultured in Luria broth (LB) supplemented with 50 g/ml of kanamycin. Cultured cells were grown at 37°C to mid-log phase and induced with 1 mM isopropyl ␤-D-1-thiogalactopyranoside at 16°C overnight. Cells were pelleted by centrifugation at 5,000 rpm for 10 min and resuspended in 20 mM Tris-HCl buffer, pH 8.0, containing 300 mM NaCl. For selenomethionine-derivatized protein the above procedure was used but adjusted as follows: E. coli B834 cells were transformed with the appropriate recombinant plasmid. Overnight 5-ml cultures, in LB, were then used to inoculate 100 ml of LB culture in a 250-ml flask, which was then grown to an O.D. of 0.4. A methionine-deficient media was prepared using the Molecular Dimensions SelenoMet TM Medium Base (MD12-501) and SelenoMet TM Nutrient mixtures (MD12-502) and was used Figure 5. Phylogenetic tree of the PL27 family. Four subgroups exhibit large evolutionary distances: Bacteroidetes (green subtree); Firmicutes and one Spirochaetes (orange subtree); Actinobacteria and a cohort pf Ascomycota (purple subtree); and Ascomycota-only (blue subtree). Labels on leaves indicate for each protein its species, its phylum, and the type of residue aligned to the catalytic Y. Split gene models that were manually fused for this analysis are indicated with the tag FUSED. Figure 6. Proposed catalytic mechanism for PL27. The proposed catalytic mechanism, proceeding through an enolate transition state, with tyrosine acting as the catalytic acid/base and an arginine serving to stabilize the negative charge that is developed.
to wash the cultured B834 cells. The cells were then inoculated into 1 liter of methionine-deficient media to which selenomethione was added to a final concentration of 5 mg/ml. Cells were collected and disrupted by sonication, and the cell-free extract was recovered by centrifugation at 15,000 rpm for 30 min. Recombinant proteins were purified from the cell-free extract using immobilized metal affinity chromatography using Talon TM , a cobalt-based matrix. Proteins were eluted from the column in Buffer A containing 100 mM imidazole. For crystallographic studies, BT0263 and BT3687 were further purified by size exclusion chromatography using a Superdex S200 16/600 column equilibrated with Buffer A on a fast protein liquid chromatography system (ÄKTA FPLC; GE Healthcare). All proteins were purified to electrophoretic homogeneity as judged by SDS-PAGE.

Mutagenesis
Site-directed mutagenesis was conducted using the PCRbased QuikChange site-directed mutagenesis kit (Stratagene) according to the manufacturer's instructions, using the plasmid encoding BACCELL_00875 and BT3687 as the template and appropriate primer pairs.

Purification of oligosaccharides
GA (10 mg/ml) was treated with an exo-␤1,3-D-galactanase from B. thetaiotaomicron (BT0265), which cleaves the ␤1,3-galactan backbone of the polysaccharide. The active site of the enzyme can bind galactose residues in the ␤1,3-galactan backbone that are decorated at O6 with oligosaccharide side chains. Thus, upon glycosidic bond cleavage of the backbone the galactose residues released carry their oligosaccharide side chains. The different oligosaccharide side chains generated by BT0265 were purified by size exclusion chromatography using a P-2 Gel filtration column (Bio-Rad). The column was equilibrated with 50 mM acetic acid and the same buffer was used in the chromatographic separation. To identify the targets, the samples were digested with BT0263 (1 M, overnight at 37°C) and the reactions were run on high-pressure anion exchange chromatography with pulsed amperometric and UV-visible detectors. Each sample was dried and stored at Ϫ20°C until use.

Enzyme assays
All enzyme assays unless otherwise stated were carried out in 20 mM sodium phosphate buffer, pH 7.0, containing 150 mM NaCl and performed by triplicate. Assays were carried out with 1 M BT0263, BACCELL_00875, BACFIN_07013, and BT3687 against 1-10 mg/ml of substrate at 37°C. Aliquots were taken over a 16-h time course, and samples and products were assessed by TLC and high-pressure anion exchange chromatography with pulsed amperometric detection. Sugars were separated on a Carbopac PA1 guard and analytical column in an isocratic program of 100 mM sodium hydroxide and then with a 40% linear gradient of sodium acetate over 60 min. Sugars were detected using the carbohydrate standard quad waveform for electrochemical detection at a gold working electrode with an Ag/AgCl pH reference electrode. Kinetic parameters were determined using the L-rhamnose detection kit from Megazyme International, measuring the release of rhamnose by absorbance of 340 nm. To determine kinetic parameters, 1 M of the appropriate enzyme was assayed against varying concentrations of polysaccharide or oligosaccharides between 0.1 and 3 mM. L-Rhamnose release was measured, and the values were plotted using linear regression giving k cat /K m as the slope of the line. Mutants were assessed for activity against GA at 1 mg/ml with varying enzyme concentrations between 1 and 10 M with assays running for minutes (wild type) up to days (mutants displaying very little activity).

Mass spectrometry
BT0263 was incubated with the GA-derived side chain oligosaccharides overnight at 37°C. To identify the size of each oligosaccharide the enzymatic reaction products and the original untreated glycans were per-O-methylated to improve the mass spectrometric response (38). Excess protein was removed by filtering the reaction mixture over a 0.5-ml Dowex anioncation exchange resin mixture (Sigma). Permethylated glycans were redissolved in 10 l of methanol. One microliter of the permethylated glycan solution was mixed with 1 l of a solution of 10 mg/ml of 2,5-dihydroxybenzoic acid in acetonitrile; 0.7 ml of this mixture was spotted onto a stainless steel MALDI plate to air dry. Positive-ion MALDI mass spectra were obtained using an Ultraflex III mass spectrometer (Bruker) in reflectron mode, equipped with a Nd:yttrium/aluminum garnet Smartbeam TM laser. Mass spectra were acquired over the m/z range 250 -3,000 with ion suppression below 700. The laser power setting varied around 50% of maximum with each spectrum acquired using between 1,500 and 4,000 laser shots in total, using Bruker flexControl software. Mass spectra were externally calibrated against an adjacent spot containing six peptides (des-Arg 1 -Bradykinin, 904.681; Angiotensin I, 1,296.685; Glu 1 -Fibri-nopeptide B, 1,750.677; adrenocorticotropic hormone fragment (ACTH) (1-17 clip), 2,093.086; ACTH (18 -39 clip), 2,465.198; ACTH (7-38 clip), 3,657.929 (Sigma)). Bruker flexAna-lysis software (version 3.3) was used to perform the spectral processing.

Crystallization, data collection, structure solution, and refinement
For crystallization trials, immobilized metal affinity chromatography-purified protein was concentrated and further purified by gel filtration chromatography using a Superdex S200 16/600 column equilibrated in Buffer A. BT3687, at 15 mg/ml, was crystallized using the sitting drop vapor diffusion method with 200 mM sodium malonate and 20% polyethylene glycol (PEG) 3350. SeMet-BACCELL_00875 and native BACCELL_00875, at 10 mg/ml, were crystallized from 25% PEG1500, 0.1 M MMT buffer (DL-malic acid, MES, and TRIS (1:2:2 molar ratio)), pH 4.0. For data collection, the samples were transferred to cryo-protecting solution consisting of mother liquor supplemented with 15-20% (v/v) PEG 400. Alternatively Paratone-N oil was used to replace the mother liquor before cooling the sample in liquid nitrogen. Diffraction data were collected using synchrotron radiation at the Diamond Light Source on beamlines I02 and I04. The data were integrated using XDS (39) or iMosflm (40); scaled and merged with Aimless (41). The phase problem for BACCELL_00875 was solved by SeMet-single-wavelength anomalous dispersion using Hkl2map (42) and the Shelx suite (43). Native data of BACCELL_00875 was solved using Phaser (44) using the SeMet-solved structure as search model. BT3687 was solved by molecular replacement using MrBump (45). The best solution from MrBump was with search model 4CE7 prepared with Chainsaw (46) and solved with Phaser (44). Buccaneer (47) and/or Arp-warp (48) were used for automated model building where needed. Recursive cycles of manual model building in COOT (47) and automatic refinement in Refmac5 (49) were performed to produce the final model; 5% of the observations were randomly selected for the R free set. All models were validated using Molprobity (50). The data statistics and refinement details are reported in Table 4.

Family delineation and phylogeny
BT0263 was used as an initial query against all available genomes in GenBank TM . Proteins with Ͼ60% identity were aligned to identify domain boundaries based on sequence conservation, and then used to build an HMM with HMMER (55). This model capturing the position-specific signal of conserved/ unconstraint residues was used to detect remote family members and iteratively rebuilt until family convergence, which occurred immediately. The sequences of family members were then aligned with Multalin (51) and the resulting alignment was used to construct a neighbor joining tree using BLOSUM62 substitution parameters (52) and BIO-NJ (53) and the tree was visualized using Dendroscope (54).