Discovery and characterization of family 39 glycoside hydrolases from rumen anaerobic fungi with polyspecific activity on rare arabinosyl substrates

Enzyme activities that improve digestion of recalcitrant plant cell wall polysaccharides may offer solutions for sustainable industries. To this end, anaerobic fungi in the rumen have been identified as a promising source of novel carbohydrate active enzymes (CAZymes) that modify plant cell wall polysaccharides and other complex glycans. Many CAZymes share insufficient sequence identity to characterized proteins from other microbial ecosystems to infer their function; thus presenting challenges to their identification. In this study, four rumen fungal genes (nf2152, nf2215, nf2523, and pr2455) were identified that encode family 39 glycoside hydrolases (GH39s), and have conserved structural features with GH51s. Two recombinant proteins, NF2152 and NF2523, were characterized using a variety of biochemical and structural techniques, and were determined to have distinct catalytic activities. NF2152 releases a single product, β1,2-arabinobiose (Ara2) from sugar beet arabinan (SBA), and β1,2-Ara2 and α-1,2-galactoarabinose (Gal-Ara) from rye arabinoxylan (RAX). NF2523 exclusively releases α-1,2-Gal-Ara from RAX, which represents the first description of a galacto-(α-1,2)-arabinosidase. Both β-1,2-Ara2 and α-1,2-Gal-Ara are disaccharides not previously described within SBA and RAX. In this regard, the enzymes studied here may represent valuable new biocatalytic tools for investigating the structures of rare arabinosyl-containing glycans, and potentially for facilitating their modification in industrial applications.

The rumen microbiome is recognized as one of the most efficient microbial ecosystems in the degradation of plant biomass (1,2). It contains a diverse microbial community with large numbers of bacteria, anaerobic fungi, ciliate protozoa, and bacteriophages. Of these, rumen bacteria and fungi are considered to be indispensable for plant fiber digestion. It has been estimated that at least 85% of the microbes inhabiting the rumen have not been cultured using traditional approaches (1). In recent years, our knowledge of structure and function of the rumen microbial diversity has drastically increased due to improvements in sequencing technologies (3)(4)(5).
The plant cell wall comprises cellulose, hemicellulose, and pectin (6). These polysaccharides are often interconnected, contain many diverse sugar chemistries, and display a variety of decorations and branching. Cellulose is the simplest of these three classes of polysaccharides and is composed of linear polymers of ␤-1,4-linked D-glucose (7). Hemicellulose is a group of branched polysaccharides that are classified according to the primary sugar within the backbone of the polymer (e.g. xylan is composed of ␤-1,4-linked D-xylose). Hemicelluloses contain extensive variations in their repeating structure. This can be seen in arabinoxylans, such as rye arabinoxylan (RAX). 3 RAX consists of a polymer chain of ␤-1,4-linked D-xylose units, many of which are 2-or 3-monosubstituted or 2,3-disubstituted by ␣-L-arabinose (Ara) (8). Pectin is found within the primary cell wall and the middle lamella, which punctuates the junctions between primary walls of neighboring cells, and participates in intercellular connections (9). It is a structurally complex polysaccharide that is enriched in D-galacturonic acid and divided into three distinct classes of pectic polysaccharides: homogalacturonan, rhamnogalacturonan-I (RG-I), and rhamnogalac-turonan-II that display substantial variability in their structure (10,11). For example, the side chains of RG-I can be heavily decorated with arabinans (12), such as sugar beet arabinan (SBA), which consists of an ␣-1,5-L-arabinofuranosyl backbone decorated with ␣-1,2and ␣-1,3-L-arabinofuranosyl side chains.
Also present in the plant cell wall are structurally diverse cell-surface glycoproteins that are collectively referred to as arabinogalactan proteins (AGPs). This structurally diverse protein family is enriched in the amino acids: hydroxy-Pro/Pro, Ala, and Ser/Thr and heavily glycosylated (90 -98% w/w). AGPs are thought to play important roles in various aspects of plant growth and development including reproduction, cell signaling, and microbial interactions; and may serve to anchor the pectic and hemicellulosic polysaccharide networks (13).
Enzymes that modify carbohydrates are referred to as carbohydrate active enzymes (CAZymes). CAZymes are categorized into "classes" based upon their catalytic function and "families" based upon sequence relatedness (14). Family relatedness does not necessarily equate to functional relatedness, however, as many CAZyme families have now been described to contain members with variations in their enzyme specificities (see Fig.  1) (15,16), a property that is referred to as "polyspecificity." The hydrolysis of a glycosidic linkage between two or more carbohydrates, or a carbohydrate and non-carbohydrate adduct, leading to the generation of a new reducing end is catalyzed by enzymes known as glycoside hydrolases (GHs) (17). Generally, hydrolysis is performed by two primary amino acid residues within the enzyme active site. Inverting GHs, which generate products that have an inverted anomeric configuration at the nascent reducing end, possess a general acid (i.e. proton donor) and a catalytic base (i.e. proton acceptor). Alternatively, retaining GHs, which generate products that retain anomeric configuration at the nascent reducing end, possess an acid/base and a nucleophile and catalyze a double displacement reaction. Often the activity of GHs is potentiated by carbohydrate-binding modules (CBMs) through "targeting" or "concentrating" effects (18). Polyspecificity is also a common feature in the binding specificities of CBM families, such as CBM13 (19). Using CBMs to identify associations with uncharacterized genes, therefore, may assist in the discovery of new enzyme activities.
Comparative analysis of fungal genomes and metatranscriptomes has revealed that rumen fungi exhibit tremendous diversity in the number and types of their CAZymes (20). Harnessing this genetic reservoir holds vast potential for biotechnological applications. For example, in vitro studies have demonstrated that rumen fluid supplemented with select combinations of CAZymes noticeably boosts the release of cellobiose, glucose, and xylose from plant cell wall structural polysaccharides (21). Anerobic fungi isolated from rumen or herbivore dung, such as phylum Neocallimastigomycota, are known to participate in the deconstruction of plant cell wall substrates by invasive rhizoidal growth, which physically disrupts recalcitrant tissues; and secrete a diverse arsenal of CAZymes (22). Many of these CAZymes share little sequence identity to characterized proteins from other microbial ecosystems (20,23).
In this study, we identified four genes from anaerobic fungi that are predicted to encode proteins with an N-terminal GH39 followed by a CBM13 (19): nf2152, nf2523, and nf2215 from Neocallimastrix frontalis; and pr2455 from Piromyces rhizinflatus. These "fungal GH39s" comprise a new subfamily within GH39. To characterize their functions, we synthesized the genes, expressed them in Escherichia coli, and purified their gene products for biochemical characterization. The enzymes were screened against a library of plant cell wall substrates, and activity was observed on rare substrates present within SBA and RAX. Analyses of the product profiles of NF2152 and NF2523 revealed that two distinct products were released, underpinning that this subfamily is polyspecific with activities that are unique from what was reported for a related sequence, bgxg1, from Orpinomyces sp. strain C1A (24). Structural analysis of NF2152 provides insights into the molecular basis of recognition by this enzyme family.

Chemical structure of the SBA product
To generate sufficient product for chemical characterization, a large-scale digestion of SBA was performed. Following ethanol precipitation, the soluble products were fractionated by size exclusion chromatography. Eluted fractions were screened by TLC to identify peak boundaries, and samples containing similar sized products were pooled and analyzed for purity by TLC ( Fig. 2A).
To determine whether the NF2152 product had a degree of polymerization (DP) Ͼ 1, acid hydrolysis was performed. The hydrolyzed product migrated as a single band with a similar mobility to Ara on TLC (Fig. 2B) and eluted with an identical retention time as Ara when analyzed by high performance anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD) (Fig. 2C). These results suggested that NF2152 generates an oligosaccharide that is solely composed of Ara.
Comparison with a commercially available ␣-1,5-Ara 2 standard revealed that these arabinooligosaccharides displayed different mobility patterns in TLC plates (Fig. 2B). Additionally, a selection of GH51 and GH43 ␣-ABFs were unable to hydrolyze the SBA product (results not shown). This highlighted that the product has a differing chemistry, such as a modification, alternate stereochemistry (i.e. ␤), or positional (e.g. 1 3 2) linkage. Due to the limited availability of commercial standards, a ␤-1,2arabinofuranosidase GH127 from Bifidobacterium longum (BlGH127) (25) was synthesized, purified, and used for enzymatic glycosequencing. Previously this enzyme was used to elucidate the structure of a ␤-1,2-arabinosyl oligosaccharide derived from an AGP glycan (25). Treatment with BlGH127 cleaved the NF2152 product completely into Ara (Fig. 2D, inset), establishing that the product is a pure oligosaccharide with a ␤-1,2-linkage. However, this digestion does not determine the DP of the product as the GH127 may act processively on a substrate with DP Ն 2. Therefore, to determine the size of the arabinooligosaccharide, electrospray ionization mass spectrometry was performed (Fig. 2D). The product was determined to have a m/z ratio ((M ϩ Na) ϩ ) of 305.1. This ratio equates to a calculated mass of 282.1, which is the absolute mass of a pentose disaccharide, and confirms that the product is a pure ␤-1,2-Ara 2 . Additionally, the product released from an arabinan-derived tetrasaccharide was characterized (Fig. 2E). Enzymatic fingerprinting showed that this arabinotetraose contains a single ␤-L-arabinofuranosyl linkage and three ␣-L-arabinofuranosyl linkages. Digestion with NF2152 releases a disaccharide product with similar mobility to the product released from SBA, and likewise, this product was cleaved to arabinose by the ␤-1,2-L-arabinosidase, BlGH127 (Fig. 2E). These results indicate that NF2152 must hydrolyze an ␣-L-arabinofuranosyl linkage and is thus is an ␣-L-(␤-1,2)-arabinobiosidase.

Characterization of a chemically distinct product from RAX
Digestion of RAX with NF2152 generated a second band with a slower mobility, which appeared to be the primary product of a second GH39 enzyme, NF2523 (Fig. 3A). The release of two structurally distinct products by NF2152; in addition to the unique product profile of NF2523, which only released one of these two products from RAX, highlighted that there are different substrates present in RAX and that this subfamily of fungal GH39 enzymes is polyspecific with specialized variants. Notably, very little product was detected when SBA was digested with NF2523. To determine whether the second product was an arabinooligosaccharide with a DP Ͼ 2, mass spectrometry was performed. Somewhat surprisingly, the m/z ratio of the product was ((M ϩ Na) ϩ ) of 335.1 and its calculated mass was determined to be 312.1, which is equivalent to the molecular weight of a hexose-pentose disaccharide. Therefore, acid hydrolysis was performed and the products where analyzed by TLC and HPAEC-PAD (Fig. 3B). Consistent with the MS result, two distinct monosaccharides were detected: Ara and galactose (Gal). Ara is a main compositional sugar of SBA (88%) and RAX (Ara 38%); whereas, Gal is less common (SBA ϭ 3%; RAX ϭ none detected). In this regard, the presence of an Ara/Gal disaccharide suggests that it is a rare structure within both polysaccharides. Alternatively, Ara and Gal are commonly linked sugars in the arabinogalactan side chains of RG-I (26). However, digests of diverse pectic substrates with GH39 enzymes did not produce any detectable products. This suggests that NF2152 and NF2523 may be targeting rare substructures within SBA or AGPs glycans that co-purify with plant cell wall polysaccharides (25).

Sequencing of the Gal/Ara product
To determine whether Ara or Gal was positioned at the reducing end of the disaccharide, two complementary techniques were used: differential gas chromatography-mass spectrometry (GC-MS) and fluorophore-assisted carbohydrate electrophoresis (FACE). First, alditol acetates were generated from intact and hydrolyzed forms of the Gal/Ara disaccharide, and then visualized by GC-MS (Fig. 4A). This technique acety- lates the free reducing end; hence, the residue at the non-reducing end is protected until it is released by acid hydrolysis. In this manner, only the reducing sugar will be visible in the GC-MS chromatogram. In Fig. 4A, Ara was present before and after acid hydrolysis; however, Gal was only visible when derivatized after hydrolysis, indicating it was protected when the disaccharide is intact. This analysis revealed the product as Gal-Ara (i.e. Gal ϭ non-reducing; Ara ϭ reducing end), a result that was confirmed by labeling the reducing end of the disaccharide with the fluorogenic compound 8-aminonaphthalene-1,3,6-trisulfonic acid (ANTS). Following derivatization, the disaccharide was hydrolyzed into monosaccharides and analyzed by FACE (Fig. 4B).  Visualization of the products revealed that only the Ara was labeled, and therefore, exposed at the reducing end of the intact disaccharide.

NMR analysis of the reducing end absolute configuration and linkage
To provide further insights into the structure of the two products, both were analyzed by 1 H and 13 C nuclear magnetic resonance (NMR) (Fig. 5, A-D, Table 1). Both samples generated background peaks resulting from ring dynamics that obscured their interpretation; therefore, they were reduced to alditols with sodium borohydride. The product released from SBA by NF2152 was determined to be ␤-arabinofuranose-(1-2)-arabitol (Fig. 5E). The chemical shifts and coupling constants of its non-reducing monosaccharide suggested a ␤-arabinofuranose (27). The arabitol portion and the linkage were confirmed by two-dimensional NMR spectra including correlation spectroscopy (COSY), heteronuclear single quantum correlation (HSQC), and heteronuclear multiple bond correlation (HMBC). The full structure was confirmed by two-dimensional NMR.
The reduced RAX product generated 1 H and 13 C chemical shifts and a coupling constant of the anomeric proton (4.0 Hz) of its non-reducing monosaccharide that matched those of ␣-galactopyranose (28,29). The NMR data for the arabitol portion are identical with those observed for the SBA product. The reduced RAX product generated by NF2523, therefore, was identified as ␣-galactopyranose-(1-2)-arabitol (Fig. 5F). This ring configuration of Gal differs substantially from the furanose observed in the SBA product, which is structurally similar to galactofuranose (Fig. 5G).
Inspection of the putative active site of NF2152 revealed electron density consistent with the presence of a molecule of bistris methane (Fig. 6A, inset). This molecule was a component of the crystallization solution and proved to be essential for the reproducible generation of high quality crystals. The bis-tris methane is tightly sandwiched between Trp 118 and Trp 289 , and forms several direct and water-mediated H-bonds. Despite multiple attempts to soak and co-crystallize NF2152 with ␤-1,2-Ara 2 we were unable to dislodge the buffer molecule from the active site, and other conditions failed to generate diffraction quality crystals. Therefore, to investigate the potential interactions between NF2152 and Ara in the Ϫ1 subsite we performed a structural superimposition with ␣-L-Ara bound in the active site pocket of the GH51 ABF from Thermotoga maritima (TmGH51; PDB code 3UG4, Fig. 6B (34)). GH51s are retaining enzymes with experimentally determined catalytic residues. In TmGH51, the nucleophile is Glu 281 and the acid/ base is Glu 172 (34), which overlay with Glu 254 and Glu 155 , respectively, in NF2152. Both of these residues are in reasonable proximity to the C1-OH of the Ara present in the overlaid TmGH51 structure to facilitate hydrolysis. To confirm catalytic function, Glu 254 and Glu 155 were mutated to glutamine residues and tested for activity on SBA. Mutations to these residues resulted in the complete loss of detectable product (Fig. 6C, top Characterization of a novel glycoside hydrolase activity panel). This suggested Glu 155 and Glu 254 are catalytic residues, whereas their spatial arrangement, indicated that NF2152 harnesses a retaining mechanism, consistent with other GH39s and GH51s. Previously, an unrelated retaining ␤-1,2-arabinobiosidase, BlGH121 was able to transglycosylate ␤-1,2-Ara 2 in the presence of primary alcohols (27), which is indicative of retaining mechanisms. When incubated with purified ␤-1,2-Ara 2 and MeOH or EtOH, NF2152 also generated products with shifts in mobility, which is consistent with transfer of the Ara 2 , through a double displacement mechanism (Fig. 6C).
Although the core catalytic residues are spatially conserved between NF2152 and TmGH51, closer inspection of the C2-OH of the Ara in the Ϫ1 subsite suggested that in this orientation the Ara is poised to form an interaction with Asn 154/171 . This interaction provides a recognition determinant exo-acting ABFs, such as TmGH51, which only removes single Ara decorations (34). In NF2152 this interaction with Asn 154 would preclude a conjugated ␤-1,2-Ara or ␣-1,2-Gal from extending deeper into the pocket. Therefore, in this subfamily of anaerobic fungal GH39 enzymes it is probable that the Ara in the Ϫ1 adopts a different orientation. In support of this, the two tryptophan residues positioned at the mouth to the  active site cleft adopt strikingly different conformations in NF2152 and TmGH51 (Fig. 6B). Blast searches of NF2152 determined that there are other related entries in the database, including bacterial entries from Clostridia spp. and Fibrobacter spp., suggesting that NF2152 and NF2523 may be representative members of a larger subfamily. To investigate the functional diversity of other fungal GH39s, digestions were performed on RAX with NF2215 and PR2455. These results demonstrated that they each have subtle differences in their product profiles when compared with NF2152 and NF2523 (Fig. 6D). These signature patterns occur despite strict conservation of residues lining the active site in the Ϫ1 and Ϫ2 subsites, which are predicted to be involved in substrate binding and catalysis (Fig. 6D). Four residue positions that displayed variations in primary structure were identified, including His 52 , Gln 87 , Glu 110 , and Val 291 . The conservation of these residues within the larger group of anaerobic fungal GH39 homologs was assessed by performing a cluster analysis with 12 sequences with Clustal Omega (36), and then mapped onto the surface of NF2152 using ConSurf (37) (Fig. 6E). Analysis of the active site confirmed that His 52 , Asn 87 , and Val 291 (white) and Glu 110 (cyan) represent potential hot spot sites for variation within the Ϫ2 subsite (Fig. 6E). In particular, the transition from Val 291 in NF2152 to an Asn in NF2523 and Thr in PR2455 may illuminate key molecular determinants for recognition of Gal at the non-reducing end ␣-1,2-Gal-Ara product. An ␤-1,2-Ara 2 with an ␣-L-Ara at the reducing end could be reasonably docked into the active site of NF2152, demonstrating that there is sufficient room to accommodate the product.
This interaction would require a ϳ20°clockwise rotation of the Ara in the Ϫ1 subsite. Importantly, His 52 (3.7 Å from the O4), Gln 87 (3.2 and 3.0 Å from the C2-OH and C3-OH, respectively), and Glu 110 (3.8 Å from the C5-OH) are disposed in reasonable proximity to interact with the side chains of Ara in the Ϫ2 subsite (Fig. 6F). In addition, Val 291 (3.8 Å from the C2-OH) may contribute to van der Waals interactions.

Discussion
Discovery of new enzyme activities that improve the digestibility of recalcitrant plant cell wall polysaccharides are promising solutions for clean, sustainable industries. Additionally, biocatalysts with novel activities that operate as surgical tools hold promise for dissecting the structures of complex glycans or generating rare or commercially valuable carbohydrates. In this regard, the rumen anaerobic fungi represent an underexplored source of potentially unique enzymes. Here we have investigated the activities of GH39 enzymes from rumen fungi that were identified by their association with CBM13s.
Most commonly, CBM binding specificity parallels the catalytic activity of the appended enzyme module, and therefore, investigating uncharacterized enzymes associated with CBMs with diverse specificities should facilitate the discovery of new CAZyme families, and potentially, activities. Using this rationale, hypothetical proteins appended to predicted CBM13s (19) were identified within fungal transcriptomes from N. frontalis and P. rhizinflatus. These sequences were trimmed to an N-terminal fragment of sufficient length to encode a functional enzyme, and embedded into extracted datasets of characterized GH39 and GH51 sequences to generate "informed" phylogenetic trees. GH39 and GH51 are well characterized polyspecific enzyme families with activity on a wide range of substrates (14). In both trees the fungal sequences partition as a single clade, distantly related to bacterial ␤-xylosidases and ABFs, in GH39 and GH51, respectively (Fig. 1). These phylogenetic relationships suggest that GH39 enzymes may have diverged following horizontal gene transfer from a bacterial ancestor; and based upon sequence, they may be active in the hydrolysis of either ␣-L-arabinofuranosyl or ␤-xylosyl substrates. However, with the contemporary perspective that many CAZyme families are polyspecific (15,16), including GH39 and GH51, it has become apparent that accurate assignment of enzyme activity cannot rely solely on phylogenetic patterns and requires evidencebased biochemical characterization.
To investigate the function of these fungal GH39s, a synthetic form of NF2152 was produced and screened for activity on a variety of plant cell wall carbohydrates. NF2152 was shown to release an Ara 2 product from SBA (Fig. 2) and RAX (Fig. 3A). The release of similar products from two substrates isolated from different plant sources that display different structures suggests that there is a rare substrate consistent within both sources of plant polysaccharides.
SBA is an Ara-rich glycan found within the side chains of RG-I. It is composed of a ␣-1,5-L-arabinofuranosyl backbone decorated with ␣-1,2and ␣-1,3-L-arabinofuranosyl side chains. Ara is an abundant pentose in nature; and therefore, discovery of novel biocatalysts active on arabinans are promising tools for bioconversion industries. There are many enzymes

Characterization of a novel glycoside hydrolase activity
known to be active on SBA. Collectively these enzymes are referred to as arabinanases (ABNs), which are found in GH43 and GH93 families (38); and ABFs, which are found in GH3, GH43, GH51, GH54, GH62, GH93, and GH127 families (25,27,39). ABNs catalyze the hydrolysis of the ␣-1,5-L-arabinofuranosyl backbone of plant cell wall arabinans, releasing arabinooligosaccharides and Ara. ABFs cleave arabinosyl decorations in an exo-fashion. Currently, several ␣-1,5-arabinobiosidases have been reported that processively hydrolyze the backbone of arabinan (40). Recent studies have shown that the synergistic effect of fungal enzyme mixtures supplemented with ABNs and ABFs improves the hydrolysis rate of plant biomass (41). Arabinoxylans are an abundant polysaccharide within the plant cell wall (42). RAX possesses a D-xylosyl backbone decorated with ␣-1,2and ␣-1,3-linked L-arabinosyl residues (42,43). Digestion of RAX requires a combination of enzyme activities, including ABFs and xylanases, which are found mainly in GH10, GH11, and also in GH5, GH7, GH8, and GH43 (44). Multiple enzyme products that target the xylan backbone of arabinoxylan products are currently available indicating there is a market for biocatalysts that improve the digestion of RAX.
Using acid hydrolysis (Fig. 2, B and C), and diagnostic enzyme digests and HR-MS (Fig. 2, D and E), and diagnostic digestions of a purified arabinotetraose (Fig. 2E), the product released by NF2152 was determined to be ␤-1,2-L-Araf 2 . This underpinned that the target substrate is a rare, and potentially trace contaminant within SBA and RAX. Additionally, it reveals that NF2152 is not a conventional ABF or xylosidase as was suggested by phylogenetic analysis (Fig. 1). Recently, BlGH121 was reported to be the first described ␤-1,2-arabinobiosidase, which is active on AGP glycans that co-purify with SBA (27). Importantly, BlGH121 products are also hydrolyzed by BlGH127 (25). This suggests that GH39 enzymes, found in anaerobic fungi, and BlGH121, a family of bacterial enzymes, may have undergone functional convergence. To the best of our knowledge this small collection of enzymes represents some of the only known enzymes specific for the degradation of ␤-1,2arabinofuranoside containing glycans.
The detection of a second product with lower mobility was observed when RAX was digested with NF2152 and NF2523 ( Fig. 2 and 3). The composition of this heterogeneous disaccharide was determined to be Gal and Ara (Fig. 3B). To provide more insight into the product chemistry, the disaccharide was sequenced using differential alditol acetate and ANTS labeling, and analyzed by GC-MS (Fig. 4A) and FACE (Fig. 4B), respectively. Both techniques confirmed that Ara was positioned at the reducing end; and therefore, the product is a Gal-Ara disaccharide. Although NMR is the gold standard for these types of analyses, the development of differential labeling techniques can be a valuable approach for rapidly sequencing heterogeneous disaccharides if access to NMR infrastructure is lacking or product yields are limiting.
Preliminary NMR spectra of both products suffered from high backgrounds of acetate (reaction buffer) and multiple signals for the anomeric carbon, suggesting the carbohydrates were adopting several conformations. Therefore, large scale purifications of both disaccharides were performed and the products were reduced to arabitols to linearize the compounds. 1 H NMR (Fig. 5, A and C) and 13 C NMR (Fig. 5, B and D) analyses confirmed the order of sequence for both the Ara 2 and Gal-Ara, and that both products contained ␤-1,2-linkages (Fig.   5, E and F). Additionally, at the non-reducing end Ara is in the furanose configuration (which is consistent with the BlGH127 digest, Fig. 2D), and Gal is in a pyranose configuration. This latter observation was somewhat surprising as we had anticipated the Gal residue may be in the ␣-D-galactofuranose configuration, which has an analogous ring structure to ␤-L-arabinofuranose and only differs by the presence of its C5-hydroxyl methyl group (Fig. 5G). This establishes that the NF2523 substrates are of plant origin, and not of fungal origin as galactofuranoses are common components of fungal cell walls (45). Furthermore, elucidation that Ara was positioned at the reducing end in both products indicates that there is a strict requirement for Ara in their Ϫ1 subsites and they display plasticity in their Ϫ2 subsites (Fig. 5, E and F).
The structure of NF2152 confirmed that it adopts the canonical TIM-barrel-fold of GH39s and share significant structural similarity with GH51s (Fig. 6A). The polyspecificity reported for these families, including ABFs, xylosidases, endoglucanases, and many others (Fig. 1) underscores the functional plasticity of this core-fold (14,46). The position of the active site was highlighted by the presence of a bound bis-tris buffer molecule (Fig.  6A) and the superimposition of the Ara product from TmGH51 (Fig. 6B). Ara is orientated with its C1 exiting the pocket between two sequence conserved aromatics (i.e. Trp 118 and Trp 289 ), and within suitable distances to the structurally conserved catalytic residues, Glu 155 and Glu 254 . The involvement of Glu 155 and Glu 254 in catalysis was confirmed by site-directed mutagenesis and their potential roles in a retaining mechanism confirmed by the transglycosylation of MeOH and EtOH (Fig.  6C). As these features are conserved in other enzymes that function as ABFs and xylosidases, the specificity of the fungal GH39 enzymes for ␤-1,2-linked Araf containing disaccharides is believed to result primarily from interactions within the Ϫ2 subsite (Fig. 6D). This observation is in contrast to Bgxg1, which was reported to hydrolyze monosaccharides from a variety of synthetic substrates and purified disaccharides (24). Mapping the distribution of conserved surface-exposed active site residues in fungal GH39 sequences onto the surface of NF2152 revealed four key residues, His 52 , Asn 87 , Glu 110 , and Val 291 , as being variable within the Ϫ2 subsite (Fig. 6E). Importantly, they are also in suitable proximity for potential interactions with the Araf at the reducing end of ␤-1,2-Araf 2 (Fig. 6F).

Characterization of a novel glycoside hydrolase activity
These residues, therefore, may represent genetic markers for directing the discovery of other GH39 enzymes with unique specificities.

Conclusion
Although many microbial ecosystems are currently being mined for beneficial enzyme activities (e.g. human distal gut) the rumen microbiota remains one of the most promising repositories of microbial enzymes active on plant cell wall polysaccharides (20). In this study, four rumen fungal genes, classified as GH39s, were selected for characterization based upon their expression and association with CBM13. CBM13s are a family with reported diversity in their binding specificities (19).
Very little is known about the role of AGPs in plant biology and biomass conversion, mainly due to their extensive structural complexity. Recently, it was reported that an AGP may interact directly with the pectin and hemicellulose, which challenges the traditional view that these polysaccharides are independent networks (47). In this regard, enzymes that target discreet linkages within the arabinan or AGP glycan network, such as the fungal GH39 enzymes studied here, represent tools for helping to characterize complex arabinosyl-containing glycans; and potentially, may assist in their turnover during bioconversion processes.

Phylogenetic trees
Characterized sequences from families GH51 and GH39 were extracted and then trimmed to functional modules (i.e. GH catalytic fragments) by dbCAN (48). Trimmed fragments were then merged with catalytic fragments of the fungal sequences, and aligned via multiple sequence comparison by log-expectation (49). Phylogenetic grouping was performed with FastTree 2 (50). ProtTest3 was used for the selection of the best-fit model used by FastTree (51). Finally, FigTree was used for tree creation (tree.bio.ed.ac.uk/software/figtree). 4

Gene synthesis and cloning
Sequences corresponding to the mature GH39 gene products (lacking signal peptides) of NF2152, NF2215, NF2523, and PR2455 (GenBank accession numbers: MF326649, MF326650, MF326651, and MF326652) were commercially synthesized (BioBasic Inc., Markham, Canada) codon optimized for expression in E. coli and cloned into the pET28a vector. For crystallization experiments the DNA sequence corresponding to residues 29 -431 of NF2152 was PCR amplified and cloned into the NdeI and XhoI sites of pET28a to create pET28_nf2152_29 -431. All constructs were confirmed by DNA sequencing (Eurofins Genomics, Louisville, KY).

Expression of synthetic genes and purification of recombinant proteins
The pET28a constructs were transformed into E. coli BL21 Star (DE3) for recombinant protein production. Cells were grown at 37°C to an OD ϭ 600 nm of 0.8 -1.0 in LB broth containing kanamycin (50 g ml Ϫ1 ). Cultures were cooled to 16°C and gene expression was induced with 0.2 mM isopropyl 1-thio-␤-D-galactopyranoside overnight at 200 rpm. Overnight cultures were centrifuged at 6,500 ϫ g for 10 min. Cells were resuspended in Binding Buffer (BB: 0.5 M NaCl, 20 mM Tris, pH 8.0) and lysed by sonication for 2 min of 1-s intervals of medium intensity sonic pulses at a power setting of 4.5 (Heat Systems Ultrasonics Model W-225 and probe). The cell lysate was clarified by centrifugation at 17,500 ϫ g for 45 min and passed through a 0.45-m filter. The filtrate was loaded onto a nickelnitrilotriacetic acid column and purified by immobilized metal affinity chromatography. Recombinant protein was eluted via a stepwise gradient of imidazole (5,10,30,100,200, and 500 mM) in BB. Fractions containing significant amounts of protein were pooled and buffer exchanged into storage buffer (NF2152, 20 mM Tris-HCl, pH 8.0, 500 mM NaCl; NF2152_29 -431, 20 mM NaPO 4 , pH 7.2, 100 mM NaCl; NF2215, NF2523, and PR2455, 20 mM Tris-HCl, pH 8.0, 150 mM NaCl). Following buffer exchange, samples were concentrated using a nitrogen-pressurized stirred ultrafiltration cell (Amicon) with a molecular mass cut-off of 5 kDa. Concentrated protein was filtered and passed through a HiPrep 16/60 Sephacryl S-200 HR size exclusion column (GE Healthcare) at a flow rate of 1.0 ml min Ϫ1 in storage buffer. Pure fractions were pooled and concentrated. NF2152 was further buffer exchanged into 20 mM Tris-HCl (pH 8.0), 100 mM NaCl. Protein concentration was determined using the Beer-Lambert law and extinction coefficients determined with ProtParam (52).

Thin layer chromatography
Digested samples (total 9 l; spotted 3 times with 3 l each time) were spotted onto TLC plates (TLC Silica Gel 60; EMD Millipore Corp.). The samples were dried between multiple rounds of spotting. Appropriate standards (6 l of 1 mM concentration) were also included in each run. The samples were resolved using a mobile phase of 2:1:1 butanol:water:acetic acid, dried prior to visualization with an orcinol solution (70:3, acetic acid:sulfuric acid with 1% orcinol) and heating at 100°C for 3-5 min.

HPAEC-PAD of monosaccharide and oligosaccharide reaction products
HPAEC-PAD was performed with a Dionex ICS-3000 chromatography system (Thermo Scientific) equipped with an autosampler as well as a pulsed amperometric (PAD) detector. 10 l of aqueous sample were injected onto an analytical (3 ϫ 150 mm) CarboPac PA20 column (Thermo Scientific) and eluted at 0.4 ml min Ϫ1 flow rate with a sodium acetate gradient (0 to 1 min, 0 mM; 1 to 18 min, 250 -850 mM; 18 to 20 min, 850 mM; 20 to 30 min, 850 -0 mM) in a constant background of 100 mM NaOH. The elution was monitored with a PAD detector (standard quadratic waveform).

Ethanol precipitation
Precipitations were performed on digested products to increase their purity by separating small products (e.g. monosaccharides and disaccharides) from larger oligosaccharides and polysaccharides by their differential solubility. The digested products were dried by SpeedVac at ambient temperature or lyophilization, and then suspended in 95% ethanol (v/v, EtOH: H 2 O). After incubation on ice for 10 min the resuspension was mixed vigorously with a vortex to dissolve the small oligosaccharides and monosaccharides and then clarified by centrifugation at 14,000 ϫ g for 10 min. The supernatant was removed and dried by SpeedVac. Purified carbohydrates were subsequently used directly or suspended in water at the desired concentrations for further analyses.

Acid hydrolysis
30 g of purified dried products were incubated with 200 l of 2.0 M trifluoroacetic acid for 4 h at 100°C. The reaction mixture was then dried to completion in a SpeedVac followed by threewasheswith100lofisopropylalcohol.Releasedmonosaccharides were analyzed by TLC and HPAEC-PAD.

Molecular weight determination by mass spectrometry
A large scale (total reaction volume: 200 ml) digest was performed to generate milligram amounts of product with NF2152 and SBA or NF2523 with RAX. The products were further purified by ethanol precipitation (95% v/v EtOH:H 2 O) and Bio-Gel P-2 (Bio-Rad Laboratories) size exclusion chromatography at a flow rate of 0.17 ml min Ϫ1 where distilled water was used as eluent. Extra fine (Ͻ45 M) Bio-Gel P-2 has particle size beads with an exclusion limit of 100 -1,800 Da. The elution peaks were screened for purity by TLC, pooled, and lyophilized. Mass spectrometry was carried out on the pooled fractions to determine the m/z and molecular weight of the product (Alberta Glycomics Centre, Department of Chemistry, University of Alberta, Edmonton, Canada). Electrospray ionization mass spectra were recorded on an Agilent Technologies 6220 TOF instrument. The sample was dissolved in methanol or a methanol:water mixture (1:1) and directly injected into the instrument (5 l). The spectra were recorded in positive mode.

Carbohydrate sequencing by GC-MS
The purified product was used for gas chromatography to identify the product as well as to recognize the reducing end of the product. Gas chromatographic analysis of mono-and disaccharides requires conversion of sugars into their volatile derivatives (53). Sugars (purified product) were first converted into alditol acetates, which involved reduction of sugars with sodium borohydride (NaBH 4 ) following conversion of polyols to polyacetate esters. Fifty micrograms of products were reduced by 200 l of NaBH 4 (10 mg/ml of NaBH 4 in 1 M NH 4 OH). The reduced sugar was then acid hydrolyzed by 200 l of 2 M TFA followed by an incubation at 100°C for 4 h. The reaction mixture was then dried to completion in a SpeedVac followed by three wash steps with 100 l of isopropyl alcohol. Released monosaccharides were acetylated by the addition of 250 l of acetic anhydride and dried on a SpeedVac to a volume of 200 l. The resulting solution was transferred to a GC autosampler vial containing a 250 -300 micro insert and injected into gas chromatograph (Hewlett Packard 5890) where a polar capillary GC column (SP2330) and flame ionization detector was used.

Carbohydrate sequencing by FACE
FACE was performed to identify the reducing end of the digested products. Carbohydrates were fluorescently labeled at their reducing end using ANTS, and the resulting labeled sugars were separated in a high percentage (40%) polyacrylamide gel (54). 30 g of each sample were dried and suspended by vortexing in 5 l of fresh 0.15 M ANTS (dissolved in 15% acetic acid) and 5 l of fresh 1 M 2-picoline borane (dissolve in 1 ml of DMSO). Samples were incubated overnight at 37°C in a tube wrapped in foil. The labeled samples were then dried with a SpeedVac for 2-4 h or until completely dry. The labeled sugar was then acid hydrolyzed by the addition of 200 l of 2 M TFA followed by incubation at 100°C for 4 h. The reaction mixture was dried to completion in a SpeedVac followed by three wash steps with 100 l of isopropyl alcohol. The dried pellet was suspended in 25 l of FACE loading dye and run on gel immediately or stored at Ϫ20°C wrapped in foil.

Linkage and ring configuration by NMR
Spectra were measured with a Varian VNMRS-500 MHz in D 2 O at 25°C (Department of Chemistry and Biochemistry, Concordia University, Montreal, Canada). Chemical shifts and coupling constants were interpreted basing on one-dimensional NMR ( 1 H and 13 C) as well as two-dimensional NMR homonuclear correlation spectroscopy and heteronuclear single quantum correlation. The observation that the anomeric proton (proton 1Ј) had a coupling constant of 5.0 Hz was evidence for ␤-configuration of the Ara (55).

Protein crystallization NF2153 and structure solution
Purified NF2152_29 -431 was buffer exchanged into 20 mM bis-tris (pH 6.0), and concentrated to 20 mg ml Ϫ1 . Crystals of NF2152 were grown using the hanging drop vapor diffusion method at 18°C in 20% (w/v) PEG 3350, 0.1 M bis-tris (pH 6.0), and 0.15 M NaI with a drop ratio of 1:2 protein solution to mother liquor. Crystals were cryoprotected with 25% (w/v) ethylene glycol and flash cooled directly in the cryostream at 100 K. Diffraction data were collected on an instrument comprising a MicroMax MM-007HF X-ray generator coupled to a Dectris Pilatus 200K detector with VariMax HF Arc Sec Confocal optics and an Oxford Cryostream Crystream 800 cryocooler. Data were processed and scaled with HKL-3000R (56). An initial model was determined by single-wavelength anomalous dispersion using SHELXC/D/E (57,58). Phases were improved with Phaser (59), followed by successive rounds of density modification with Parrot (60) and model building with Buccaneer (61) using the CCP4 Phaser SAD pipeline (62). With an estimated solvent fraction of 35%, 13 iodide sites were identified. The resulting phases were sufficient for Arp/Warp to build a virtually complete model (63). This model was then used for iterative rounds of manual correction with COOT (64) and refinement with PHENIX (65). Water molecules were added with COOT FINDWATERS and manually checked after refinement. Throughout, refinement procedures were monitored by flagging 5% of all observation as "free" (66). Model validation was performed with MolProbity (67). Data collection and processing statistics are shown in Table 2.

Site-directed mutagenesis
Glu 155 and Glu 254 in NF2152 were predicted to be the acid/ base and nucleophile, respectively, based upon superimposition with the GH51 from T. maritima (3UG4) (34). These residues were targeted for substitution to glutamine using site-directed mutagenesis with pET28_nf2152_29 -431 for template as previously described (68). Mutations were confirmed by DNA sequencing (Eurofins Genomics). Once sequence confirmed, the mutant enzymes were tested for activity with appropriate substrates.

Transglycosylation assay
Size exclusion purified ␤1,2-Ara 2 was incubated at a concentration of 1 mM with 1 M NF2152 in 20 mM sodium acetate (pH 5.0) and 20% (v/v) methanol or ethanol. After overnight incubation at 37°C the samples were heat treated at 100°C for 10 min to terminate the reaction. Reaction products were visualized by TLC.
Author contributions-D. R. J. performed enzyme characterization, product analysis, and solved the structure of NF2152; and assisted with figure generation and manuscript writing. M. S. U. performed enzyme screening, mutagenesis, and analysis of product structure; and assisted with writing and figure generation. R. J. G. generated fungal transcriptomes and gene sequences and assisted with protein crystallization. M. P. T. T. performed NMR experiments and determined structures of reduced SAB and RAX products. D. T. generated phylogenetic trees. A. B. B. and B. P. assisted with X-ray data collection and structure solution. J. B. generated the arabinotetraose. T. A. M. and R. F. helped design the study and identify target sequences. A. T. sequenced fungal transcripts, and helped in study design and NMR structure determination. L. B. S. helped in study design and interpretation of results. D. W. A. conceived of study, contributed to interpretation of the results, and wrote the initial draft of the paper. All authors reviewed the results and approved the final version of the manuscript.