Structural and Functional Characterization of a Ruminal β-Glycosidase Defines a Novel Subfamily of Glycoside Hydrolase Family 3 with Permuted Domain Topology

Metagenomics has opened up a vast pool of genes for putative, yet uncharacterized, enzymes. It widens our knowledge on the enzyme diversity world and discloses new families for which a clear classification is still needed, as is exemplified by glycoside hydrolase family-3 (GH3) proteins. Herein, we describe a GH3 enzyme (GlyA1) from resident microbial communities in strained ruminal fluid. The enzyme is a β-glucosidase/β-xylosidase that also shows β-galactosidase, β-fucosidase, α-arabinofuranosidase, and α-arabinopyranosidase activities. Short cello- and xylo-oligosaccharides, sophorose and gentibiose, are among the preferred substrates, with the large polysaccharide lichenan also being hydrolyzed by GlyA1. The determination of the crystal structure of the enzyme in combination with deletion and site-directed mutagenesis allowed identification of its unusual domain composition and the active site architecture. Complexes of GlyA1 with glucose, galactose, and xylose allowed picturing the catalytic pocket and illustrated the molecular basis of the substrate specificity. A hydrophobic platform defined by residues Trp-711 and Trp-106, located in a highly mobile loop, appears able to allocate differently β-linked bioses. GlyA1 includes an additional C-terminal domain previously unobserved in GH3 members, but crystallization of the full-length enzyme was unsuccessful. Therefore, small angle x-ray experiments have been performed to investigate the molecular flexibility and overall putative shape. This study provided evidence that GlyA1 defines a new subfamily of GH3 proteins with a novel permuted domain topology. Phylogenetic analysis indicates that this topology is associated with microbes inhabiting the digestive tracts of ruminants and other animals, feeding on chemically diverse plant polymeric materials.

that GlyA 1 defines a new subfamily of GH3 proteins with a novel permuted domain topology. Phylogenetic analysis indicates that this topology is associated with microbes inhabiting the digestive tracts of ruminants and other animals, feeding on chemically diverse plant polymeric materials.
Despite the high number of known GH3 sequences, structural knowledge on members of the GH3 family was absent until 1999, when the three-dimensional structure of the ␤-Dglucan exohydrolase Exo1 from Hordeum vulgare (barley) was reported (7). This study showed the core structure of most GH3 enzymes consisting of an N-terminal (␣/␤) 8 barrel domain 1, which houses the active site pocket and the nucleophile, and a C-terminal (␣/␤) 6 -sandwich domain 2, containing the acid/ base catalyst. The contribution of different domains in supplying crucial catalytic residues was a highly unusual feature of GH3 enzymes. Furthermore, in the last few years many new structural studies have shown a great variety in domain composition and arrangement of typical GH3 ␤-glycosidases, having up to four separate domains (8 -15). Although this variety produces a shift in the sequence position of the acid/base catalyst, the known structures revealed that its structural location is well conserved among the different members. In contrast, several reported structures have revealed a more uniform pattern of the ␤-N-acetylglucosaminidases (NagZ) members showing that, despite a few having two domains, most Gram-negative bacteria encode single domain enzymes, and all of them have the acid/base catalyst in an unusual histidine/aspartate dyad located in a flexible loop of the (␣/␤) 8 barrel (16). This highly mobile loop has been proved to participate in substrate distortion to a 1 S 3 conformation, therefore forming a productive Michaelis complex along catalysis (17). This has not been observed in other GH3 enzymes, with the substrate being in a relaxed chair conformation, although a Michaelis complex has been recently reported for the Listeria innocua ␤-glucosidase (18). Among all GH3 ␤-glycosidases with available structures, insights into the substrate specificity observed in the family has been reported for the H. vulgare Exo1 (7, 19 -21) or the ␤-glucosidases from Thermotoga neapolitana (8) and Kluyveromyces marxianus (9). However, the high varieties in structure and composition found among the different enzymes make it difficult to extrapolate general rules explaining function, and a clear classification of different subfamilies is still needed.
A proper classification of GH3 glycosidases may require extensive biochemical and structural characterization of new enzymes. In this context, nature provides an inexhaustible reservoir from which enzymes can be isolated (22), because they are continuously changing and evolving as a consequence of natural processes of selection. Genomics and metagenomics have made accessible such an enormous reserve of uncharacterized enzymes. Thus, we and others have recently taken advantage of sequencing and extensive screening technologies to develop enzyme discovery strategies and to identify microbial enzymes with improved and unusual activities and specificities (23)(24)(25), as well as distinct active site architectures and substrate preferences relative to other structurally characterized enzymes (26). These elegant studies demonstrated that nature contains proteins with novel and/or altered sequences and protein structures, the analysis of which represents one of the major challenges in postgenomic biology (27).
Here, activity screening of a metagenomic library created from rumen fluid led us to the isolation of a novel ␤-glycosidase, GlyA 1 , which was assigned to the GH3 family. Detailed biochemical characterization of the new enzyme revealed its substrate specificity, whereas its sequence and crystal structure analysis revealed a novel permuted domain topology, defining a new subgroup within the GH3 family. The enzyme contains an additional C-terminal domain, previously unidentified, with its molecular flexibility being explored by small angle x-ray scattering (SAXS) analysis. The structural and biochemical analysis of the GlyA 1 hydrolase presented in this study shed new light on comparative catalysis and evolutionary model studies as well as phylogenetic relationships.

Results
Library Screening-A subset of 14,000 clones from resident microbial communities of strained ruminal fluid (SRF) collected from rumen-fistulated, non-lactating Holstein cows (28) was screened for its ability to hydrolyze p-nitrophenyl-␤-Dglucoside (pNP␤Glc) and p-nitrophenyl-␤-D-cellobioside (pNP␤Cel). We identified a positive clone (designated SRF4) that is highly active against both substrates. The fosmid with insert SRF4 (38,710 bp; G ϩ C 41.89%) was fully sequenced. A gene herein designated as glyA 1 encoding a potential GH3 ␤-glycosidase (GlyA 1 ) was identified out of the 38 distinct genes on the hit fosmid. The deduced molecular mass and estimated pI value were 101,849 Da and 4.86, respectively. This 921amino acid-long putative protein exhibited a maximum amino acid sequence identity of 59% to a similar protein in public databases (with a top hit EDO57841.1 from Clostridium sp.). A search of oligonucleotide patterns against the GOHTAM database (29) and TBLASTX analysis revealed compositional similarities between the DNA fragment (38,710 bp) containing the gene for GlyA 1 with genomic sequences of Eubacterium, Butyrivibrio, and Coprococcus spp. BLASTN revealed similarities of short DNA fragments to Prevotella and Paenibacillus spp. BLASTX (search by translated DNA sequences) showed similarity to glycosidases of unknown Clostridia (phylum Firmicutes). BLASTP search with identified protein sequences showed good matches for many of them against corresponding proteins in Eubacterium and Prevotella and members of Lachnospiraceae, Clostridium, Ruminococcus, and Bacteroides. Most likely, GlyA 1 has thus its origin in the phylum Firmicutes, and the presence of a phage gene may, however, indicate a horizontal gene transfer of the carbohydrate metabolism genes from Firmicutes to Bacteroidetes. Those microbes are known to be abundant in the ruminal environment and are thought to play key roles in the breakdown of proteins and carbohydrate polymers (30,31).
Biochemical Characterization of GlyA 1 -The gene encoding putative GH3 ␤-glycosidase (GlyA 1 ) was cloned, expressed in E. coli BL21 (DE3), and purified. The hydrolytic activity was analyzed using 18 synthetic model p-nitrophenyl (pNP) derivatives with different sugars as well as a series of 11 additional oligosaccharides. Their specific activities (units/g protein) ( Table 1) and the half-saturation (Michaelis) coefficient (K m ), the catalytic rate constant (k cat ), and the catalytic efficiency (k cat /K m ) values (Table 2) were determined. As shown in Table  1, activity was confirmed for 18 substrates that revealed that GlyA 1 is a GH3 member with clear ␤-glucosidase and ␤-xylosidase activities, but also possessing ␤-galactosidase, ␤-fucosidase, ␣-arabinofuranosidase, and ␣-arabinopyranosidase activities at low level in this order ( Table 1). The activity toward pNP-N-acetyl-␤-D-glucosaminide (pNPGlcNAc) and pNP-Nacetyl-␤-D-galactosaminide (pNPGalNAc) was below detection limits, and thus the enzyme does not have ␤-N-acetylglucosaminidase nor ␤-N-acetylgalactosaminidase activity. As shown in Table 2, in terms of catalytic efficiencies, pNP␤Cel was the preferred substrate, mainly due to the higher affinity for this substrate as compared with other pNP sugars. The purified recombinant hydrolase was also assayed for their activities toward different polymeric substrates. Using specific activity determination, GlyA 1 hydrolyzed all short cello-and xylooligosaccharides tested (degree of polymerization (DP) from 2 to 5), with longer substrates being slightly preferred ( Table 1). The catalytic efficiencies (k cat /K m ) while using the non-activated substrates cellobiose and xylobiose were lower than those found for pNP␤Cel and pNP-␤-xylobiose (pNP␤Xylb), respectively, mainly due to a significant decrease of k cat values for the natural disaccharides (Table 2). A comparison of kinetic parameters using the natural substrates xylobiose and cellobiose and the synthetic pNP␤Xylb and pNP␤Cel substrates confirmed the ϳ2-fold higher affinity for oligosaccharides containing ␤-linked glucosyl versus xylosyl substrates. In contrast, the affinities for the monosaccharides pNP␤Glc and pNP␤Xyl were essentially similar, suggesting that affinity constraints are higher as the size of the oligosaccharides increases. However, due to the differences in k cat values, no major differences in catalytic performance were observed when comparing ␤-Xyland ␤-Glc-containing sugars. The catalytic performance (k cat / K m ) found for other substrates is from low to very low mainly due to lower catalytic rates. The enzyme also exhibited activity against lichenan, suggesting that is able to hydrolyze substrates with mixed ␤-1,3/1,4 linkages. No activity was detected using avicel or filter paper, as well as toward substrates without ␤-1,4 linkages such as ␤-1,3 glucan or mixed ␤-1,3/1,6 linkages such as laminarin. Accordingly, the enzyme showed a clear preference for short cello-oligosaccharide substrates, which may likely be produced in natural settings from the cellulose components of plant cell walls due to the action of glucanases in the ruminal fluid. Other substrates such as gentibiose (containing D-glucoses joined by a ␤-1,6-linkage) and sophorose (or 2-O-␤-D-glucopyranosyl-␣-D-glucose) were also hydrolyzed to a similar extent as cellobiose and xylobiose. The optimum activity for GlyA 1 was observed within a mesophilic range (45-65°C) and within a neutral or slightly acid pH (6.0 -7.0), being most active at 55°C and a pH close to 6.5 (Fig. 1).
Biochemical Characterization of GlyA 1 -⌬Ct-A mutant containing a missing C-terminal region, herein referred to as GlyA 1 -⌬Ct, was created in the vector pQE80L. After purification, activity was determined for the 18 sugars being hydrolyzed by the wild-type enzyme, so the effect of the C-terminal region was tested. As shown in Table 1, the specific activity of the mutant was from 2-to 18.4-fold lower than that of the wild type, suggesting the importance of this region in the overall activity of the enzyme. The negative effect of the elimination of the C-terminal domain (compared with the full-length protein) was most notable for the hydrolysis of sugars containing ␤-glucose (from 11.3-to 17.1-fold activity reduction) as compared with those containing ␤-xylose (from 4.6-to 5.5-fold lower activity).
Crystal Structure Determination-Preliminary crystals from the wild-type GlyA 1 were obtained after more than 3 months with PEG3350 as the precipitant, and they were cryoprotected into 25% D-glucose to obtain the complex with this sugar. The structure was solved by molecular replacement using the domains from T. neapolitana ␤-glucosidase as independent search models. Refinement and analysis of electron density maps allowed modeling of the chain containing residues 3-798 but did not show any density to build the C-terminal segment 800 -921, suggesting a putative cleavage of this region in the slow crystallization step. The low numbers of crystals impeded analysis of the intact protein by mass spectrometry, but SDS-PAGE analysis of protein solution samples revealed the presence of two bands after incubation at room temperature or treatment with proteases. Therefore, the sample was incubated with subtilisin previously to the crystallization step, which accelerated formation of many good quality crystals, under similar conditions and with the same space group. These crystals were cryoprotected into 20% glycerol, and this molecule was found bound at the active site. Furthermore, crystals from a truncated construct containing residues 2-799 (GlyA 1 -⌬Ct) also grew in a week with ammonium sulfate as the precipitant and, despite having different shape, yielded the same cell and space group, which is consistent with the hypothesis that the wild-type sample was cleaved. These crystals were used to obtain the complexes with D-xylose and D-galactose. Many attempts done to crystallize the complete enzyme were unsuccessful. Also, a construct with residues 800 -921, containing the isolated C-terminal region (GlyA 1 -Ct), failed to crystallize. Crystallographic data and refinement statistics for the four structures here presented are given in Table 3.  Permuted Domain Topology of GlyA 1 -The first solved structure from barley ␤-D-glucan glucohydrolase (7) showed the core structure common to GH3 enzymes, composed of an N-terminal (␣/␤) 8 barrel domain 1 linked to an (␣/␤) 6 -sandwich domain 2 ( Fig. 2A); both of them provided residues that make up the active site. The later reported structures from T. neapolitana (8), Trichoderma reesei (12), Aspergillus (13,14), and L. innocua (18) ␤-glucosidases, and a ␤-glucosidase isolated from soil compost (32), showed the presence of an additional fibronectin type III (FnIII) domain (also designated fibronectin-like domain or FLD) located at the C terminus. This three-domain arrangement is shared by other reported ␤-glucosidases from K. marxianus (9) and Streptomyces venezuelae (11) that also contain an additional PA14 domain inserted where F c is the calculated and F o is the observed structure factor amplitude of reflection hkl for the working/free (5%) set.
within the same loop of their (␣/␤) 6 -sandwich, although both are arranged in a different orientation. Moreover, the structure of the Pseudoalteromonas sp. exo-1,3/1,4-␤-glucanase has been reported to have a C-terminal domain attached to the core structure, structurally related to family 30 carbohydrate-binding modules (CBM30), although its function is unknown (10).
To expand even more this diverse landscape, GlyA 1 presents a novel structural arrangement showing permuted sequence and topology, in which the (␣/␤) 6 sandwich (previous domain 2) is located at the N terminus and the FnIII domain is sequentially inserted between this and the (␣/␤) 8 barrel ( Fig. 2A). Additionally, a 120-residue segment attached to the C terminus most surely folded into an additional domain. Fig. 2B displays the 3D structure of the solved 3-798 region of GlyA 1 , which present overall dimensions of 85 ϫ 65 ϫ 45 Å. The N-terminal (␣/␤) 6 -sandwich domain (red, residues 10 -219) is followed by the FnIII domain (beige, residues 278 -419) and the (␣/␤) 8 barrel domain (green, residues 468 -780). Two long segments connect the three domains (Fig. 2B, gray). Linker 1 (residues 220 -277) and half of linker 2 (residues 411-443) are tightly wrapped over the core structure, whereas the rest of linker 2 (444 -467) forms an extended arm that clasps the (␣/␤) 8 barrel. Finally, the regions at the beginning and the end of the chain are making a two-stranded ␤-sheet that laces the core structure at the top.
Equally to GlyA 1 , all these domains present a deviation from the canonical (␣/␤) 8 barrel topology, which was first observed in the T. neapolitana ␤-glucosidase. Thus, their first ␣-helix of the eight ␤-␣ motifs is missing, which has the consequence of making strand ␤2 reversed and antiparallel with the other seven strands. The different deviation from the canonical topology found at this domain is consistent with the higher deviations found in the structural comparison of GlyA 1 with other GH3 enzymes, in the range 2.5-3 Å (16 -20% identity).
Interestingly, the GlyA 1 core is structurally rather conserved with known ␤-glucosidases with equivalent domain architecture (Fig. 2C). The superposition of the T. neapolitana ␤-glucosidase onto the structure of GlyA 1 reported here shows small differences in the orientation of some of the helices (Fig. 2D). The main difference is the long arm that links the FnIII to the (␣/␤) 8 domain in GlyA 1 , which is missing in T. neapolitana ␤-glucosidase. There are also significant differences in the loops surrounding the active site both in length and orientation, which must be related to the different substrate specificity, as commented below.
Architecture of the Active Site-The active site of GlyA 1 is located at the molecular surface, at the interface between the (␣/␤) 8 barrel domain, which provides the nucleophile Asp-709 and the (␣/␤) 6 -sandwich domain, contributing to the Glu-143 acid/base catalyst (Fig. 3A). The participation of Asp-709 in substrate hydrolysis was confirmed by site-directed mutagenesis (D709A) in GlyA 1 and GlyA 1 -⌬Ct, as K m and k cat values could not be determined from the data obtained due to the activity value being below the detection limit. It is a pocket of 12 Å deep with a narrow entrance 4 -6 Å wide. A detailed structural comparison with the T. neapolitana ␤-glucosidase (Fig.  3A) reveals the main differences in loop conformation observed around the active site that are responsible for making a deeper catalytic pocket in GlyA 1 . First, loop ␤7-␣7 of the (␣/␤) 8 barrel, following the nucleophile Asp-709 (residues 711-726), has an 11-residue insertion that extends away from the pocket and interacts with the long segment linking the FnIII domain to the barrel, which is missing in the T. neapolitana ␤-glucosidase.
Here, Arg-717 makes an ion pair with Glu-447 at the small helix located in the middle of the extended linker, which helps in stabilizing this region. An important feature of this ␤7-␣7 loop is the presence of Trp-711, close to the nucleophile Asp-709, that protrudes from the surface and delineates a narrow catalytic pocket. Moreover, and despite loop ␤3-␣3 (residues 536 -550) being shorter in GlyA 1 , Arg-538 clearly bulges into the pocket contributing to constrict it even more.
With respect to the (␣/␤) 6 -sandwich, similarly to that observed in T. neapolitana ␤-glucosidase, this domain is shaping the active site by means of two loops, residues 139 -152 containing the acid/base catalyst Glu-143 and residues 100 -113 enclosing Trp-106 that clearly projects into the catalytic pocket. Interestingly, the last loop is markedly flexible as it is deduced from the fact that it could only be fully traced in the ligand-free crystal, containing only glycerol in the active site, and in the galactose-soaked crystal of the truncated form. In contrast, the crystals of the full-length and truncated forms, soaked into glucose and xylose, respectively, showed poor density that precluded tracing residues 104 -107. Furthermore, the traced loops showed significant conformational changes in the different crystals at Trp-111, coupled to a change in Phe-147 from the adjacent 139 -152 loop (Fig. 3A), reinforcing its intrinsic mobility. The loop equivalent to 100 -113, which is highly variable within GH3 enzymes, was proposed to be involved in recognition of large substrates from the crystal structure of T. neapolitana ␤-glucosidase, which showed some disorder that precluded tracing of a segment equivalent to that not observed in some GlyA 1 crystals. Noteworthy, the non-visible region of T. neapolitana ␤-glucosidase includes Trp-420 that, consequently, may be defining additional binding subsites, similarly to Trp-106. However, the remaining sequence is not conserved, with both Phe-147 and Trp-111 being unique to GlyA 1 , and therefore, the substrate recognition mode presented by the two enzymes to accommodate the substrate may be different.
Soaking with xylose and glucose showed a clear density indicating that both sugars occupy the catalytic pocket subsite Ϫ1 in a relaxed chair conformation (Fig. 3B). This subsite is well conserved among known GH3 ␤-glucosidases and, with the exception of the acid base catalyst, is made up entirely by residues from the (␣/␤) 8 barrel domain. Thus, residues from the loops emerging from the central ␤-strands are making a tight net of hydrogen bonds that accommodate the glycon with all its OH groups making at least two polar interactions. The glycon moiety is located by stacking to Trp-710, and the acid base catalyst Glu-143 and the nucleophile Asp-709 interact with the O1 and O2 hydroxyls, as is expected in GH enzymes. The other residues making subsite Ϫ1 are Asp-532, Arg-597, Lys-630, His-631, Arg-641, and Tyr-677. Xylose and glucose are bound in an identical position, and the glycerol molecules observed in the ligand-free crystals are mimicking the positions occupied by C2, C3, C4, and C5 from both sugars. The additional polar interaction made by the glucose O6 hydroxyl appears consistent with the higher affinity observed in GlyA 1 toward glucosides as compared with xylosides. Thus, as shown in Table 2, the affinity for cellobiose (K m ϭ 2.4 Ϯ 0.3 mM) was ϳ2-fold higher than that for xylobiose (K m ϭ 4.7 Ϯ 0.2 mM). Interestingly, soaking of crystals with galactose showed that this sugar displays a semi-chair conformation at subsite Ϫ1 by flattening of the C4 atom that has the axial hydroxyl substituent (Fig. 3B,  inset). In this way, galactose is accommodated by essentially the same polar interactions observed in the glucose complex, thereby explaining the activity of the enzyme on ␤-galactosides. However, the energy cost of getting the substrate ring distortion is reflected by the lower ␤-galactosidase activity, as given in Tables 1 and 2. Accordingly, the low ␤-fucosidase and ␣-arabinosidase activities must reflect some degree of deviation from the glucose-binding pattern, through ring distortion and/or loss of polar interactions, but in any case the plasticity of the catalytic site provides a notable capacity of GlyA 1 to accept different sugars (from high to low and very low specificity) .
As said before, and in contrast to that observed in T. neapolitana ␤-glucosidase that presents an active site opened to the solvent with only subsite Ϫ1 being defined, more subsites are apparent in GlyA 1 . To delineate a putative ϩ1 subsite, we modeled the position of the non-hydrolysable substrate analogs thiocellobiose and thiogentibiose by structural superimposition on the previously reported experimental barley complexes Structure and Function of a Ruminal ␤-Glycosidase NOVEMBER 11, 2016 • VOLUME 291 • NUMBER 46 (34). As shown in Fig. 3C, Trp-106 and Trp-711 define a hydrophobic patch that may allocate the oligosaccharides at a putative subsite ϩ1, leaving a range of possible ring orientations compatible with the observed activity of GlyA 1 against differently ␤-linked bioses, as given in Table 1. Also, the long chain of Arg-538, protruding at the catalytic pocket as said above, is in good position to stabilize the sugar unit by making hydrogen bonds to one or possibly two of its hydroxyl groups. The impor-tant contribution of subsite ϩ1 to GlyA 1 substrate binding efficiency (both glucosides and xylosides) is manifested by the lower K m value with pNP␤Cel compared with pNP␤Glc and by the lower K m value with pNP␤Xylb compared with pNP␤Xyl (Table 2).
Furthermore, inspection of the molecular surface of the active site cavity shown in Fig. 3D suggests the possible existence of additional subsites, which is illustrated by several Xylose binds in the same relaxed chair conformation, and only interaction of the glucose O6 hydroxyl is missing. Inset, binding mode of galactose in a semi-chair conformation by flattening of the C4 atom that has the axial hydroxyl substituent and keeping the same interaction pattern. C, thiocellobiose (cyan) and thiogentibiose (pink) modeled at the active site by structural superimposition to the previously determined ␤-D-glucan glucohydrolase barley complexes (PDB entries 1IEX and 3WLP (34)), delineating putative subsite ϩ1. D, molecular surface of the GlyA 1 active site, with relevant residues as sticks. Three different ␤-1,4/␤-1,3-linked tetraglucosides have been manually docked by superposition of their non-reduced end to the experimental glucose: a cellotetraose, as found in PDB entry 2Z1S (green); a Glc-4Glc-3Glc-4Glc (purple), and a Glc-4Glc-4Glc-3Glc (yellow), as built by the on-line carbohydratebuilding program GLYCAM (45) and exported in its minimum energy state. E, superposition of GlyA 1 -Glc structure (beige) with those reported for T. reesei ␤-glucosidase (purple) (12) and barley ␤-D-glucan glucohydrolase complexed with thiocellobiose (cyan) (34).
␤-1,4/1,3-linked oligosaccharides that have been modeled at the active site as follows: a glucotetraose (green), a Glc-4Glc-3Glc-4Glc chain (purple), and a Glc-4Glc-4Glc-3Glc (yellow). These sugars have been docked by superimposition of their non-reducing units onto the observed glucose at the GlyA 1 complex. The hydrophobic patch defined by Trp-106 and Trp-711 may fit the oligosaccharides at subsites ϩ1 and ϩ2, and the long side chain of Lys-723 seems available to make polar interactions with the hydroxyl groups defining a possible subsite ϩ3. The putative existence of at least three subsites in the GlyA 1 active site would be in agreement with the tendency of an increased activity against longer cello-and xylo-oligosaccharides (see Table 1). Also, the tendency of increased activity against longer cello-and xylo-oligosaccharides as given in Table 1 suggests interactions at more distal positions and therefore the possibility of additional subsites. Moreover, the shape of the active site seems compatible with the mixed ␤-1,4/1,3links of the modeled tetrasaccharides, thereby explaining the observed activity on the medium size polymer lichenan.
Comparison of the GlyA 1 -Glc structure with those reported for T. reesei ␤-glucosidase (12) and barley ␤-D-glucan glucohydrolase complexed with thiocellobiose ( Fig. 3E) (7) displays the different hydrophobic platforms found at each active site. The barley ␤-D-glucan glucohydrolase structure showed a narrow channel with the glucose tightly arranged at subsite ϩ1, being sandwiched between Trp-286 and Trp-434 side chains. In contrast, the GlyA 1 Trp-711 is perpendicular and oriented similarly to Trp-37 found in T. reesei ␤-glucosidase, although both residues are provided by different loops from the (␣/␤) 8 barrel domain. At the opposite face, GlyA1 Trp-106 is structurally equivalent to Tyr-443 and Trp-434 from the barley and T. reesei enzymes, although all of them come from different loops within the (␣/␤) 6 -sandwich domain. Interestingly, other enzymes present an aromatic residue in a position identical to Trp-106, but they are provided by the PA14 domain, Phe-508 in the case of the K. marxianus ␤-glucosidase, or by a long loop coming from the other subunit, Tyr-583 in the case of the L. innocua ␤-glucosidase dimer (data not shown) (18). This feature illustrates that these highly diverse enzymes have evolved common topology and molecular mechanisms, and yet the precise structural differences behind that regulate specificity.
SAXS Analysis of GlyA 1 -Because of the unfeasibility in crystallizing the full-length GlyA 1 , we explored its overall flexibility and putative shape in solution by SAXS experiments. Thus, we compared the molecular descriptors of the complete construct with respect to the truncated construct GlyA 1 -⌬Ct, lacking the C-terminal domain. For this purpose, several solutions with varying concentrations were measured for each sample, and their scattering curves were merged to extrapolate idealized data. Analysis of the scattering curves shows a good fit to the Guinier approximation, which indicates that the samples are not aggregated. Also, the calculated radii of gyration (R g ) are consistent across the range of measured concentrations. Then, the overall size descriptors can be properly determined for each construct.
First of all, the calculated molecular masses from both samples are close to the expected values (Table 4), indicating the presence of monomers, and also a 15-kDa higher mass in the complete protein, which excludes proteolysis of the analyzed sample in the short time of the experiment. Furthermore, the R g and the maximum distance (D max ) for the complete protein are only slightly higher than the truncated protein, which may be indicate that the extra C-terminal domain is not too extended from the core structure. In support of this hypothesis, the pairwise distance distribution function P(r) calculated for both constructs shows a similar unimodal pattern consistent with a single domain protein in both cases. Furthermore, the analysis of the scattering function by the Kratki plots is consistent with the expected profile for a folded protein with a clear peak, in contrast what is observed in multidomain proteins with flexible linkers that present several peaks or smoother profiles. Consequently, we do not observe in the data calculated from the complete protein any of the signs that may be indicative of molecular flexibility, i.e. large R g and D max , absence of correlation in the P(r) function, or smooth Kratky plots. Therefore, SAXS analysis appears consistent with a compact overall shape of the complete GlyA 1 , in which the extra C-terminal region would not define a marked separate or flexible domain but rather it could be folded over the core three-domain structure.
To test the feasibility of this hypothesis, ab initio models were generated for complete GlyA 1 from SAXS data. First, two models of the last 120 residues (GlyA 1 -Ct) were obtained, as explained under "Experimental Procedures," with both showing an overall ␤-sandwich topology. This topology is related to carbohydrate-binding domains within families CBM6 and CBM35, to which GlyA 1 -Ct presents 15-20% sequence identity, although the equivalent carbohydrate-binding motifs, typically clusters of conserved aromatic residues, are not evident in its surface. Then, three runs of CORAL were computed by considering the experimental structure of the truncated protein and each of the two models. The six models obtained are shown in Fig. 4. Analysis of these models reveals that all of them cluster around a reduced area that would locate the C-terminal region relatively distant from the catalytic pocket but quite near the mobile loop (residues 100 -113). Overall, these models are consistent with the hypothesis proposed above, suggesting that GlyA 1 -Ct may be somewhat packed between the two domains making the core structure and, interestingly, with a putative

Structure and Function of a Ruminal ␤-Glycosidase
NOVEMBER 11, 2016 • VOLUME 291 • NUMBER 46 linker somehow exposed to solvent. This feature might possibly explain the proteolysis observed in the complete protein.
GlyA 1 Phylogenetic Analysis-Our structural analysis illustrated that the permuted domain architecture of GlyA 1 keeps the location of the active site at the interface between the (␣/␤) 8 barrel and the (␣/␤) 6 -sandwich domains. As mentioned above, N-acetylglucosaminidases are built by a single domain, with its (␣/␤) 8 barrel holding both the nucleophile and acid/base catalyst. Interestingly, the Bacillus subtilis NagZ shows the twodomain composition but still keeps the catalytic residues at the (␣/␤) 8 barrel (16). Therefore, this domain may be considered as the characteristic signature of GH3 enzymes. To examine the phylogenetic positioning of ␤-glucosidases with inverted topology (represented by GlyA 1 ) within the GH3 family, we have carried out a phylogenetic analysis based on the sequence of its (␣/␤) 8 barrel domain (ABB in this analysis). Sequences representative for each of the domain architectures found in the GH3 domain were selected (details under "Experimental Procedures"). The five topologies selected for this study are ABB, ABB-ABS, ABB-ABS-FLD, ABS-FLD-ABB, and ABB-AB-S(PA14)-FLD (ABS (␣/␤) 6 -sandwich; FLD is fibronectin-like type III domain). The resulting phylogenetic tree given in Fig. 5 shows apparent correlation between ABB sequence divergence and domain architecture. Most single domain sequences (ABB) cluster together and correspond to N-acetylglucosaminidases (Fig. 5, salmon area of the tree). Insertion of the ABS module is associated with three different nodes (a, b, and c in Fig. 5). Insertion at node a was not accompanied by a significant divergence in the ABB sequence because both ABB and ABB-ABS architectures appear mixed at this node. In fact, these ABB-ABS sequences also correspond to N-acetylglucosaminidases, and crystallographic data of B. subtilis NagZ show that the two modules are quite independent from a structural point of view. ABS insertion at nodes b and c would correspond to the divergence of GH3 enzymes giving rise to other activities, mainly ␤-glucosidase. Within node c, other modules (FLD and PA14) were appended after ABS. At node b, fusion of C-terminal FLD seems to occur close to ABS addition because most sequences contain both modules. GlyA 1 and the other GH3 enzymes with inverted topology arose within this cluster. The phylogenetic analysis shows that the inverted topology is predominantly found in Firmicutes, although it is also present in at least another phylum (Actinobacteria) and even Archaea. Furthermore, it appears clearly associated to enzymes belonging to bacteria dwelling in the digestive tract of animals.

Discussion
In this work, a functional metagenome library analysis was used to identify a ␤-glycosidase from a plant polymer-degrading microorganism populating the rumen of a dairy cow. The enzyme most likely originated from the genome of a representative of Firmicutes phylum known to be abundant in the ruminal environment (30,31).
The structural and biochemical analysis of the GlyA 1 hydrolase presented in this study sheds new light on the mechanisms of the catalysis and evolutionary patterns of the GH3 family. Our data demonstrated that GlyA 1 has a permuted domain topology. It is well documented that the formation of new domain combinations is an important mechanism in protein evolution. The major molecular mechanism that leads to multidomain proteins and novel combinations is non-homologous recombination, sometimes referred to as "domain shuffling." This may cause recombination of domains to form different domain architectures. Proteins with the same series of domains or domain architecture are related by descent (i.e. evolved from one common ancestor) and tend to have the same function (35), which is rarely the case if domain order is switched. Indeed, a detailed analysis of the structures of proteins containing Rossmann fold domains demonstrated that the N-to C-terminal order of the domains is conserved because the proteins have descended from a common ancestor. For pairs of proteins in the PDB in which the order is reversed, the interface and functional relationships of the domains are altered (36). This was also proved in this study, which revealed that the altered domain architecture in GH3 mostly evolved from a distinct ecological niche, most likely from digestive tracts, including that of the ruminants. Also, the substrate specificity of the GlyA 1 protein is markedly different from that of reported GH3 members. Indeed, GlyA 1 is a uncommon multifunctional GH3 with ␤-glucosidase, ␤-xylosidase, ␤-galactosidase, ␤-fucosidase, ␣-arabinofuranosidase, ␣-arabinopyranosidase, and lichenase co-activities, with the ability to degrade ␤-1,2-, ␤-1,3-, ␤-1,4-, and ␤-1,6-glucobioses.
From an ecological point of view, the rumen compartment provides stable and favorable conditions for microbial growth and is also permanently exposed to plant biomass; for this reason, it contains specialized microorganisms that are permanently competing or collaborating for the degradation of the plant fibers. The data herein suggest that this factor, namely the high exposure to plant biomass, which is less common in other habitats, may be a strong force driving the establishment of gut microbiota with GH3 protein with permuted structures that may provide ecological advantages. Indeed, the permuted domain topology may confer the protein different functional- Six ab initio models were generated for complete GlyA 1 from SAXS data, using the experimental structure of the truncated protein and two different models of the last 120 residues (GlyA 1 -Ct). The two templates were obtained from Swiss-Model (red) (48) or CPHmodel (blue) (49) servers, which predict different lengths of the linker attaching this domain to the core protein, 32 or 5 residues, respectively. CORAL (47) modeling of this linker in each run is represented in spheres. The active site pocket is indicated by the galactose found at the crystal (yellow), and the mobile loop (residues 100 -113), as observed in the galactose-soaked crystals, is highlighted in green.
ities such as the ability to expand the pool of biomass-like substrates being hydrolyzed. Overall, our results (analysis of oligonucleotide pattern and phylogenetic tree) strongly suggest that GlyA 1 and related GH3 enzymes with inverted topology emerged in Firmicutes, where their presence is rather frequent, and are transferred by horizontal gene transfer to bacteria from other phyla and even to another kingdom (Archaea). It is well documented that these wide ranging gene transfer events take place at high frequency in the rumen (37,38). Probably, GlyA 1 topology arose from a sequence encoding a GH3 enzyme with ABB-ABS-FLD domain architecture by gene inversion. Although the inversion surely rendered a nonfunctional gene, further mutations that would restore some sort of glycolytic activity would be strongly favored by selective pressure.
Structural analysis illustrates the permuted domain composition of GlyA 1 that is composed of an N-terminal (␣/␤) 6 -sandwich domain, followed by the FnIII domain, and the (␣/␤) 8 barrel domain. Based on sequence data, a C-terminal domain was expected after the (␣/␤) 8 barrel domain. However, attempts to crystallize the C-terminal region of the protein were unsuccessful, and its functional role was unclear. Biochemical characterization of the GlyA 1 and GlyA 1 -⌬Ct proteins revealed that the C-terminal domain does not affect the overall substrate profile of the protein, but rather it affects the catalytic performance, which is significantly lower in the truncated GlyA 1 -⌬Ct protein. This suggests that most likely the C-terminal domain may not have a direct role in substrate binding, but still it might disturb the dynamics of the proximate mobile loop (residues 100 -113), which seems directly involved in catalysis.
According to available structure-prediction tools, this C-terminal region is expected to adopt a lectin-like topology, related to the CBM6/CBM35 domains. However, it does not seem an obvious carbohydrate-binding domain, and in fact, binding to xylan, cellulose, and barley glucan was not observed by affinity gel electrophoresis assays (data not shown). Nevertheless, although its involvement in binding small substrates does not seem apparent, this domain might be playing a role in positioning or locating the enzyme to distal positions of a yet unknown polymeric substrate by recognizing specific but still unidenti-fied substitutions. Alternatively, it could play a role in keeping the enzyme attached to the cell surface, facilitating the intake of its products and conferring the bacteria an advantage over competing organisms. Interestingly, the analysis of the GlyA 1 -Ct homologous sequences shows that these domains are attached to GH3 ␤-glucosidases from a ruminal environment, and this feature points to a possible function related to this ecosystem. However, its presence is not related to the permuted domain topology, as only half of the sequences included in the GlyA 1 cluster (Fig. 5) contain segments equivalent to GlyA 1 -Ct.
In conclusion, the analysis of GlyA 1 here presented uncovers new features of GH3 enzymes and provides a template for a novel subfamily, including members with permuted domain topology. It also allows picturing the GlyA 1 active site architecture and the molecular basis of its substrate specificity. More work is needed to have a complete picture of the intricate molecular mechanisms that these highly diverse enzymes have evolved to tailor specificity. It will contribute to improve our knowledge about enzymatic carbohydrate degradation and open up new avenues for biocatalysis.

Experimental Procedures
Reagents and Strains-Chemicals and biochemicals were purchased from Sigma and Megazyme (Bray, Ireland) and were of pro-analysis (p.a.) quality. The oligonucleotides used for DNA amplification were synthesized by Sigma Genosys Ltd. (Pampisford, Cambs, UK). The E. coli Rosetta2 (Novagen, Darmstadt, Germany) for cloning and expression of wild-type protein and the genetic constructs in pQE80L vector were cultured and maintained according to the recommendations of the suppliers.
Metagenomic Library Screening and Positive-insert Sequencing-A pCC1FOS fosmid metagenomic library created from microbial communities from SRF of rumen-fistulated non-lactating Holstein cows was used. The construction and characteristics of the library were described previously (28). A subset of 14,000 clones were plated onto large (22.5 ϫ 22.5 cm) Petri plates with Luria Bertani (LB) agar containing chloramphenicol (12.5 g/ml) and an arabinose-containing induction solution (Epicenter Biotechnologies) at a concentration (0.01% w/v) recommended by the supplier to induce a high fosmid copy number. After overnight incubation at 37°C, the clones were screened for the ability to hydrolyze pNP␤Glc and pNP␤Cel. For screens, the plates (22.5 ϫ 22.5 cm; each containing 2,304 clones) were covered with an agar-buffered substrate solution (40 ml of 50 mM sodium acetate buffer, pH 5.6, 0.4% w/v agar and 5 mg/ml of pNP␤Glc and pNP␤Cel as substrates). Positive clones were detected by the formation of a yellow color. One positive clone, herein designated as SRF4, was selected, and its DNA insert was fully sequenced with a Roche 454 GS FLX Ti sequencer (454 Life Sciences, Branford, CT) at Life Sequencing S.L. (Valencia, Spain), and the predicted genes were identified as described previously (28).
Cloning of glyA 1 and Genetic Constructs in pQE80L Plasmid-The full coding sequence of GlyA 1 (residues 2-921) and a deleted version (residues 2-799) lacking the C-terminal domain (GlyA 1 -⌬Ct) were amplified by PCR with 4GF (CAC-GAGCTCAATATTGAAAAAGTGATACTTGATTGG) as forward oligonucleotide and 4GR1 (AGCCGTCGACTTACT-GCTGCTTTTTAAACTCTATTCG) or 4GR2 (AGCCGTCG-ACTTACACTCTTCCTGCTATCTCAACC) as reverse oligonucleotides, respectively. The SRF4 fosmid was used as the template. The PCR conditions were as follows: 95°C for 120 s, followed by 30 cycles of 95°C for 30 s, 55°C for 45 s, and 72°C for 120 s, with a final annealing at 72°C for 500 s. The PCR products were analyzed and agarose gel-purified using the Mini Elute gel purification kit (Qiagen, Hilden, Germany). The PCR products were digested with SacI/SalI and cloned in vector pQE80L to generate plasmids GlyA 1 -pQE and GlyA 1 ⌬Ct-pQE, respectively. The coding sequence of the C-terminal domain (GlyA 1 -Ct, residues 800 -921) was amplified with oligonucleotides CT1F (CACGAGCTCATAGAAGAGGATGCATTCG-ATATAG) and 4GR1 and cloned in the SacI/SalI sites of pQE80L (plasmids Ct-pQE). GlyA 1 -pQE was used as a template to introduce the mutation D709A by PCR with primers M1 (TGGTGGGCTCAGGTTAATGACC) and M2 (GGCAGTC-ATCACAATACCCTTAAAGCC), as described previously (39). The coding region of the resulting plasmids was fully sequenced to check for the absence of undesired mutation. The E. coli strain Rosetta2 (Novagen, Darmstadt, Germany) was transformed with the selected plasmids; the clones were selected on LB agar supplemented with ampicillin (100 g/ml) and chloramphenicol (68 g/ml) and stored with 20% (v/v) glycerol at Ϫ80°C.
Site-directed Mutagenesis-Mutation D709A was introduced into the corresponding pQE80L plasmids containing genes encoding GlyA 1 and GlyA 1 -⌬Ct, using the QuikChange II XL mutagenesis kit from Agilent Technologies, Inc. (Santa Clara, CA), with TGGTGGGCTCAGGTTAATGACC and GGCAG-TCATCACAATACCCTTAAAGCC as forward and reverse oligonucleotides, respectively. The resulting variant plasmids were transferred into E. coli strain Rosetta2 (Novagen, Darmstadt, Germany) and selected on the LB agar supplemented with the same antibiotics as parental plasmids.
Gene Expression and Protein Purification-For enzyme expression and purification of wild-type and mutant GlyA 1 and GlyA 1 -⌬Ct variants, as well as GlyA 1 -Ct in the pQE80L vector, a single colony (E. coli Rosetta2) was grown overnight at 37°C with shaking at 200 rpm in 100 ml of 2ϫ TY medium (1% yeast extract, 1.5% tryptone, 0.5% NaCl) containing ampicillin (100 g/ml) and chloramphenicol (68 g/ml), in a 1-liter flask. Afterward, 25 ml of this culture was used to inoculate 1 liter of 2ϫ TY medium, which was then incubated to an A 600 nm of ϳ0.6 (range from 0.55 to 0.75) at 37°C. Protein expression was induced by 0.9 mM isopropyl ␤-D-galactopyranoside followed by incubation for 16 h at 16°C. The cells were harvested by centrifugation at 5000 ϫ g for 15 min to yield 2-3 g/liter pellet (wet weight). The cell pellet was frozen at Ϫ80°C overnight, thawed, and resuspended in 3 ml of 20 mM phosphate buffer, pH 7.4, 500 mM NaCl/g of wet cells. Lysonase bioprocessing reagent (Novagen, Darmstadt, Germany) was then added (4 l/g wet cells) and incubated for 30 min on ice with rotated mixing. The cell suspension was then sonicated for a total of 1.2 min and centrifuged at 15,000 ϫ g for 15 min at 4°C; the supernatant was retained. The His 6 -tagged enzyme was purified at 4°C after binding to a nickel-nitrilotriacetic acid His⅐Bind resin (Novagen, Darmstadt, Germany). The columns were prewashed with 20 mM phosphate buffer, pH 7.4, 500 mM NaCl, and 50 mM imidazole, and the enzyme was eluted with the same buffer but containing 500 mM imidazole. The monitoring of the enzyme elution was performed by SDS-PAGE and/or activity measurements, using standard assays (see below). After elution, protein solution was extensively dialyzed with 20 mM Tris, pH 7.5, 50 mM NaCl by ultrafiltration through low adsorption hydrophilic 10,000 nominal molecular weight limit cutoff membranes (regenerated cellulose, Amicon), after which the protein was maintained at a concentration of 10 mg/ml; the protein stock solution was stored at Ϫ20°C until used in assays. The purity was assessed as Ͼ95% using SDS-PAGE, which was performed with 12% (v/v) polyacrylamide gels, using a Bio-Rad Mini Protean system. Prior to crystallization assays, 2 mM dithiothreitol (DTT) was added.
Biochemical Assays-Specific activity (units/g) and kinetic parameters (K m and k cat ) were first determined using pNP sugars (read at 405 nm) in 96-well plates, as described previously (28). pNP substrates tested included those containing ␣-glucose (pNP␣Glc), ␣-maltose (pNP␣Mal), ␤-glucose (pNP␤Glc), ␤-cellobiose (pNP␤Cel), ␣-arabinofuranose (pNP␣Araf), ␤arabinopyranose (pNP␤Arap), ␣-xylose (pNP␣Xyl), ␤-xylose (pNP␤Xyl), ␤-xylobiose (pNP␤Xylb), ␣-fucose (pNP␣Fuc), ␣-rhamnose (pNP␣Rha), ␣-mannose (pNP␣Man), ␤-mannose (pNP␤Man), ␣-galactose (pNP␣Gal), ␤-galactose (pNP␤Gal), ␤-lactose (pNP␤Lac), N-acetyl-␤-D-glucosaminide (pNPGlcNAc), and N-acetyl-␤-D-galactosaminide (pNPGalNAc). For cellooligosaccharides (DP from 2 to 5), gentibiose and sophorose, the level of released glucose was determined using a glucose oxidase kit (Sigma). The level of released xylose from xylooligosaccharides (DP from 2 to 5) was determined using the D-xylose assay kit from Megazyme (Bray, Ireland). Substrate specificity was investigated also using carboxymethylcellulose, lichenan, barley glucan, laminarin, and avicel (all from Sigma and filter paper (Whatman, UK). Specific activity for all these sugars was quantified by measuring release of reducing sugars according to Miller (50). For K m determinations, assay reactions were conducted by adding a protein concentration of 0.23 M to an assay mixture containing from 0 to 30 mM sugar in 50 mM sodium acetate buffer, pH 5.6, T ϭ 40°C. Total reaction volume was 200 l. For k cat determinations, under the same conditions, sugar concentration was set up to 2 times the K m value, and the protein concentration was from 0 to 0.23 M. For specific activity determinations (units/g), a protein concentration of 0.23 M and 10 mg/ml of the sugar or polysaccharide were used in 50 mM sodium acetate buffer, pH 5.6, T ϭ 40°C. The pH and temperature optima were determined in the range of pH 4.0 -8.5 (50 mM Britton-Robinson buffer, BR) and 20 -65°C in assays containing a protein concentration of 0.23 M and 10 mg/ml pNP␤Glc, which was used as standard substrate. BR buffer is a "universal" pH buffer used for the range pH 2-12. It consists of a mixture of 0.04 M H 3 BO 3 , 0.04 M H 3 PO 4 , and 0.04 M CH 3 COOH that has been titrated to the desired pH with 0.2 M NaOH. Optimal pH was measured at 40°C, and the optimal temperature was determined in the same buffer used in the kinetic assays. In all cases, absorbance was determined immediately after reagent and enzyme were mixed using a microplate reader every 1 min for a total time of 15 min (Synergy HT Multi-Mode Microplate Reader, BioTek). All reactions were performed in triplicate. One unit of enzyme activity was defined as the amount of enzyme required to transform 1 mol of substrate in 1 min under the assay conditions, with extinction coefficients as in Ref. 21. All values were corrected for non-enzymatic hydrolysis (background rate). The protein concentration was determined spectrophotometrically (at 280 nm) using a BioTek EON microplate reader (Synergy HT Multi-Mode Microplate Reader, BioTek) according to extinction coefficient of the protein (108,485 M Ϫ1 cm Ϫ1 ) corresponding to its amino acid sequence. Note that the detection limit, using a microplate reader with a filter for 405 nm, for the yellow chromogen is about 1⅐10 Ϫ6 mol/liter p-nitrophenol. Because the concentration of substrate in the assay ranges from 0 to 30 mM, it is expected that detection of the activity under our assay conditions is much above the detection limit.
Crystallization Data Collection and Crystal Structure Determination-Initial crystallization conditions for the complete GlyA 1 (10 mg/ml) were explored by high-throughput techniques with a NanoDrop robot (Innovadyne Technologies Inc.), using different commercial screens as follows: PACT and JCSGϩ Suites from Qiagen; JBScreen Classic 1-4 from Jena Bioscience; and Index, Crystal Screen, and SaltRx packages from Hampton Research. These assays were carried out using the sitting drop vapor-diffusion method in MRC 96-well crystallization plates (Molecular Dimensions).
Elongated bars grew after 3 months in 20% polyethyleneglycol (PEG) 3350, 0.2 M ammonium sulfate, BisTris, pH 5.5. For data collection, crystals were cryoprotected in mother liquor supplemented with 25% (w/v) D-glucose before being cooled in liquid nitrogen. Diffraction data were collected at the German Electron Synchrotron (Hamburg, Germany). Diffraction images were processed with XDS (40) and scaled using Aimless from the CCP4 package (41) leading to space group P2 1 2 1 2 1 . The structure was solved by molecular replacement using MOLREP (42) with reflections up to 2.5 Å resolution range and a Patterson radius of 54 Å. The template model was the ␤-glucosidase from T. neapolitana (PDB code 2X42), but the search was made in two steps. First, the region containing residues 2-315 was used for finding a partial solution. Then, another round of molecular replacement, with the region 321-721, was computed. Preliminary rigid-body refinement was carried out using REFMAC (43). Subsequently, several rounds of extensive model building with COOT (44) combined with automatic restraint refinement with flat bulk solvent correction and using maximum likelihood target features led to a model covering residues 3-798. However, no density was found for the loop 103-108 or for the last 123 residues of the protein. At the latter stages, ␤-glucose, sulfate ions, and water molecules were included in the model, which, combined with more rounds of restrained refinement, led to a final R-factor of 15.7 (R free 17.8).
The free R-factor was calculated using a subset of 5% randomly selected structure-factor amplitudes that were excluded from automated refinement. Many attempts to reproduce and improve these crystals were unsuccessful, until in situ proteolysis of the sample with subtilisin was tried. Resulting crystals grew after 15 days in the same conditions, but at pH 7.0, they were cryoprotected in 20% (v/v) glycerol and showed the same space group and cell content. Then, the truncated GlyA 1 -⌬Ct construct (residues 1-798) was tested. Initial crystallization assays were accomplished as described above, and several hits were obtained. Best crystals were grown in 2.0 M ammonium sulfate, 0.1 M BisTris, pH 5.5, and belonged to the same space group. The asymmetric unit contains a single molecule, with a Matthews's coefficient of 2.73 and a 54% solvent content within the cell.
Soaking experiments with D-xylose or D-galactose were performed with the truncated construct in mother liquor solution implemented with 5-50 mM ligand. Then, the crystals were flash-frozen into liquid nitrogen using mother liquor plus 20% (v/v) glycerol or ethylene glycol as cryoprotectants. The ligands were manually modeled into the electron density maps and were refined similarly to that described above. Although a mixture of ␣and ␤-anomers may exist in solution, only the ␤-form of the monosaccharides was observed at the active site of the different complexes. For the docked glucotetraose coordinates, not present in the Protein Data Bank, a model was built by the on-line carbohydrate-building program GLYCAM (45).
Many attempts to crystallize the C-terminal section of the protein using the available construct were unsuccessful, and therefore, a model was built as explained below. The figures were generated with PyMOL (46). The atomic coordinates have been deposited in the RCSB Protein Data Bank under the accession codes 5K6I, 5K6M, 5K6N, and 5K6O.
SAXS Measurements-GlyA 1 and GlyA 1 -⌬Ct stock solutions (10 mg/ml) were dialyzed against the same buffer (20 mM Tris-HCl, pH 7.5, 50 mM NaCl, 2 mM DTT, and 5% glycerol) for 18 h. SAXS measurements were performed at ESRF on beamline BM29, equipped with a Pilatus 1M detector. Each sample concentration, prepared by dilution of these stock solutions, was measured in 10 frames, 1-s exposure time per frame, at 4°C, at a sample-to-detector distance of 2.867 m, using an x-ray wavelength of 0.991 Å. No radiation damage was observed during the measurements. The SAXS curves for buffer solutions were subtracted from the protein solution curves before analysis.
The scattering curves from six gradual concentrations, from 0.3 to 5 mg/ml, were scaled and averaged to obtain the I(q) function using the ATSAS software package (47). The radius of gyration (R g ) for each protein was calculated by Guinier plot using the program PRIMUS, and the pair distribution function P(r) and the maximum particle size D max were obtained by the program GNOM. Then, POROD was used to calculate the excluded volume of the particle, as well as the molecular weight of each sample.
Several homology and threading modeling programs were tried to obtain a model of the last 123 residues of GlyA 1 . All of them predicted a topology corresponding to carbohydratebinding domains of families CBM6/CBM35, but they differed in the length of the linker attaching this domain to the core protein.
Finally, models obtained from Swiss-Model (48) and CPHmodel (49) servers were used (templates from PDB entries 2W46 and 1UYX), each predicting a loop of 32 or 5 residues, respectively. Both entries share less than 20% identity with the C-terminal region of GlyA 1 .
Subsequently, CORAL (47) was used for several rounds of two-domain rigid body fitting, using the GlyA 1 -⌬Ct coordinates and both templates, alternately; linkers were built as dummy atoms. The fit of the CORAL models to the SAXS experimental data were evaluated by the 2 value calculated from the program CRYSOL (47).
Sequence Analysis and Construction of a Neighbor-Joining Tree-The positioning of the sequence of the GlyA 1 (␣/␤) 8 barrel domain was analyzed in a phylogenetic tree. The predicted protein sequences were aligned against the National Center for Biotechnology Information non-redundant (NCBI nr) database using BLASTP algorithm. We downloaded all 27,499 GH3 sequences deposited in public databases. They were grouped within five different domain architectures as follows: ABB (9,196), ABB_ABS (3,392), ABB_ABS_FLD (11,910), ABB_ABS_ PA14_FLD (2,673), and ABS_FLD_ABB (328), where ABB, ABS, FLD, and PA14 refer to (␣/␤) 8 barrel domain, (␣/␤) 6sandwich, fibronectin-like type III domain, and protective antigen PA14 domain, respectively. We discarded those sequences (848) from the ABB_ABS group longer than 700 amino acids, as they represent enzymes with unidentified domains downstream from the ABS module. Subsequently, the sequence corresponding to the ABB domain was extracted from all of the five sub-groups. An additional filter was applied to remove ABB sequences with coverage lower than 60% of the consensus domain defined by Interpro or Pfam databases (i.e. with less than 200 amino acids). The final number of sequences was the following: ABB (8,109), ABB_ABS (2,312), ABB_ABS_FLD (7,335), ABB_ABS_PA14_FLD (1,664), and ABS_FLD_ABB (289). For each of the five sub-groups, redundant sequences (those sharing more than 50% identity) were eliminated to select sequences that belong to different taxonomic groups. Following this procedure, the final selected sequences were as follows: ABB (132), ABB_ABS (54), ABB_ABS_FLD (45), ABB_ABS_PA14_ FLD (20), and ABS_FLD_ABB (22). Multiple protein alignment was performed using ClustalW program, built into the software version 2.1. Phylogenetic analysis was conducted with the Ape package implemented for R programming language.
Author Contributions-J. S. A., M. F., and J. P. conceived and coordinated the study. M. V. P. and M. F. contributed to screening, gene cloning, and enzyme production and characterization. P. N. G. contributed to metagenomics clone resources. J. S. A., B. G. P., and M. R. E. designed the crystallographic work and the SAXS experiments and interpreted the results. M. R. E. performed all the crystallography and SAXS experiments. J. M. N. and J. P. performed the phylogenetic analysis. J. S. A. and M. F. wrote the paper, and all authors read and commented on the manuscript.