Substrate specificity, regiospecificity, and processivity in glycoside hydrolase family 74

Glycoside hydrolase family 74 (GH74) is a historically important family of endo-β-glucanases. On the basis of early reports of detectable activity on cellulose and soluble cellulose derivatives, GH74 was originally considered to be a “cellulase” family, although more recent studies have generally indicated a high specificity toward the ubiquitous plant cell wall matrix glycan xyloglucan. Previous studies have indicated that GH74 xyloglucanases differ in backbone cleavage regiospecificities and can adopt three distinct hydrolytic modes of action: exo, endo-dissociative, and endo-processive. To improve functional predictions within GH74, here we coupled in-depth biochemical characterization of 17 recombinant proteins with structural biology–based investigations in the context of a comprehensive molecular phylogeny, including all previously characterized family members. Elucidation of four new GH74 tertiary structures, as well as one distantly related dual seven-bladed β-propeller protein from a marine bacterium, highlighted key structure–function relationships along protein evolutionary trajectories. We could define five phylogenetic groups, which delineated the mode of action and the regiospecificity of GH74 members. At the extremes, a major group of enzymes diverged to hydrolyze the backbone of xyloglucan nonspecifically with a dissociative mode of action and relaxed backbone regiospecificity. In contrast, a sister group of GH74 enzymes has evolved a large hydrophobic platform comprising 10 subsites, which facilitates processivity. Overall, the findings of our study refine our understanding of catalysis in GH74, providing a framework for future experimentation as well as for bioinformatics predictions of sequences emerging from (meta)genomic studies.

XyGs. Depending on the plant tissue, the xylosyl branches may be further substituted with a variety of other saccharides (4). Therefore, total saccharification of XyGs requires the concerted action of several side chain-debranching and backbonecleaving enzymes (9 -12).
Endo-xyloglucanases, which cleave the XyG backbone, are found in glycoside hydrolase (GH) families GH5, GH9, GH12, GH16, GH44, and GH74 (11). Of these, GH74 currently comprises ϳ500 members, ranking it among the smaller GH families. GH74 is further distinguished from these other poly-specific families by a nearly singular specificity for XyG (13). The first GH74 enzyme to be biochemically characterized, from Aspergillus aculeatus, was described in 1989 as an "avicelase" (Avicel is a brand of microcrystalline cellulose) (14). As a result, GH74 is sometimes myopically referred to as a cellulase family, and its members are often annotated as such in (meta)genomics studies (15). However, numerous studies since the turn of this century have shown that many GH74 enzymes are in fact highly specific xyloglucanases (16 -35). The biological importance of this family is underscored by (meta)genomic studies, which have revealed the ubiquity of GH74 members in diverse ecological niches, including soil, termite and human guts, and hot springs (36 -40).
To unify disparate studies on GH74 members and resolve gaps in our current understanding of the distribution of the distinct modes of action in the family, molecular phylogeny was coupled with detailed enzymology to elucidate the substrate specificity, backbone cleavage regiospecificity, and processivity of 17 recombinant GH74 proteins in the present study. The determination of crystal structures of four GH74s and one distantly related dual seven-bladed ␤-propeller protein, together with analysis of existing GH74 structures, highlighted key structure-activity relationships across this family. Overall, this study refines our understanding of catalysis in GH74 and reveals the evolutionary trajectory of this enzyme family from dissociative toward processive modes of action.

Production and biochemical characterization
A molecular phylogeny using isolated GH74 catalytic modules from the CAZy database (13) was generated to guide protein production, enzymology, and structural biology. Previously characterized GH74 enzymes that were absent from the CAZy database were also included in this analysis (GenBank TM accession numbers CCG35167 (23) and XP_747057 (20)). In addition, two proteins (GenBank accession numbers AFV00434 and AFV00474) encoded in the Simiduia agarivorans genome, which are distantly related to GH74 enzymes based on hydrophobic cluster analysis (HCA) (48), were included as an outgroup. Thirty candidates were selected across the phylogenetic tree, of which 17 proteins were successfully recombinantly produced and purified (Fig. 1).
Proteins were first screened for activity on a range of substrates, including polysaccharides and pNP substrates. The recombinant AFV00434 and AFV00474 proteins from S. agarivorans were not active on the range of substrates tested, including XyG (data not shown). All other recombinant GH74 modules showed a strict preference for tamarind XyG. No endo-mannanase activity toward konjac glucomannan, no endo-xylanase activity toward wheat flour arabinoxylan and beechwood xylan, and no endo-glucanase activity using CMcellulose were observed. Endo-glucanase activity on HE-cellulose and on barley ␤-glucan was generally estimated to be less than 1% compared with xyloglucanase activity (data not shown). As such, we did not perform further biochemical characterization on these substrates. Overall, these results, together with previous studies (16 -35), suggest that GH74 enzymes are, in general, very specific for XyG.
To further investigate the biochemical properties of GH74 enzymes, optimum pH (Fig. S1) and temperature (Fig. S2) ranges were evaluated using XyG as a substrate (Table 1). Generally, recombinant enzymes were active at pH values ranging from 5 to 8, with optimum activities observed around pH 6 (except for Niastella koreensis GH74, which displayed maximum activity at pH 4.5). The highest activities were observed at temperatures ranging from 45 to 65°C for most recombinant enzymes, except for the thermophilic Caldicellulosiruptor lactoaceticus GH74a and GH74b and Caldicellulosiruptor bescii GH74, whose highest activities were recorded at 80°C. Michaelis-Menten analysis confirmed the high specificity of the GH74 catalytic domains for XyG, with K m and k cat values generally ranging from 0.02 to 0.31 mg/ml and from 18.1 to 170.2 s Ϫ1 , respectively (Table 1 and Fig. S3). These values are in the same range as previously characterized GH74 enzymes (16,22,29,35). Exceptionally, recombinant Streptomyces venezuelae GH74b was very unstable and precipitated rapidly in solution, which did not allow accurate kinetic characterization.
Among previously characterized GH74 enzymes, Thermotoga maritima Xeg74 showed higher activity for mixed linkage barley ␤-glucan than tamarind XyG (19), which constitutes an anomaly in this family. Unfortunately, we were unable to recombinantly produce T. maritima Xeg74 to verify this finding independently.

Regiospecificity and processivity of GH74 members
The mode of action of GH74 xyloglucanases has been described for a limited number of enzymes. Oligoxyloglucan reducing end-specific cellobiohydrolases (OXG-RCBHs) (    EDITORS' PICK: GH74 structure-function analysis wide distribution of product chain lengths (23,26,33), or they can act in a processive fashion to rapidly release small xyloglucan oligosaccharides (XyGOs) at early stages of XyG hydrolysis (20,26,28,29,35). To investigate the mode of action of our recombinant GH74 enzymes, we analyzed the time-course hydrolysis of tamarind XyG at early stages of the reaction by HPAEC-PAD ( Fig. 2 and Fig. S4). Also, most previously characterized GH74 endo-xyloglucanases release XXXG-type XyGOs via the exclusive hydrolysis of XyG at unbranched glucosyl residues (16,22,28,33,50); however, some cleave exclusively at xylosylated glucosyl units (21,23), and others can cleave at both G and X motifs (20,26,32,35). To investigate the backbone regiospecificity of GH74 enzymes, the limit digests of tamarind XyG, of XXXGXXXG, and of XXXG were analyzed by HPAEC-PAD (Fig. 3). The details of these analyses for 17 enzymes are discussed below in the context of five GH74 phylogenetic groups that delineate processivity and cleavage regiospecificity (Fig. 1).
Distantly related proteins-Proteins AFV00434 and AFV00474 from the marine bacterium S. agarivorans have very low sequence similarity to GH74 members, yet a distant relationship was detected by HCA (48) (data not shown). To investigate the structural basis for their lack of activity on XyG, we solved the tertiary structure of SaAFV00434 (PDB code 6P2K) using SelMet-derivatized protein and single anomalous dispersion (SAD) phasing. This protein shares the canonical GH74 structure (16, 35, 49 -52), comprising two seven-bladed ␤-propeller domains forming a long and wide cleft (Fig. 4A). This structure validates the HCA prediction of a distant relationship to GH74 and suggests a superfamily or "clan" (53). Two aspartic acid residues (Asp 35 and Asp 419 ), located 8.3 Å apart in the putative catalytic site, have an adequate spatial position to catalyze the hydrolysis of the glycosidic bond via an inverting mechanism. However, comparison of the protein backbone in the crystal structure of AFV00434 and Group 5 Paenibacillus odorifer GH74 (PDB entry 6MGL) (35) revealed that AFV00434 loops Trp 121 -Ala 136 , Ser 468 -Asn 476 , and Ser 706 -Tyr 716 obstruct the active cleft at subsites Ϫ3/Ϫ2, ϩ1/ϩ2, and Ϫ4/Ϫ3, respectively, providing a possible explanation for the lack of polysaccharide hydrolysis (Fig. 4A). Furthermore, SaAFV00434 lacks apparent ϩ3, ϩ4, and ϩ5 subsites (i.e. aromatic amino acids available for interactions with xyloglucan or other polysaccharides; see below).
Group 1-As defined here by the limits of our ability to discriminate members on the basis of enzyme activity, phylogenetic Group 1 (Fig. 1) encompasses a very sequence-diverse set of enzymes. These 123 bacterial enzymes belong mainly to the phyla Proteobacteria and Firmicutes, as well as Cyanobacteria, but low sequence conservation (identity Ͻ50%) results in long branches on the phylogenetic tree ( Fig. 1).
Some of the most divergent enzymes we attempted to study could not be recombinantly produced. However, we successfully produced N. koreensis GH74, Ruminococcus albus GH74a, and C. lactoaceticus GH74a. These three enzymes acted as endo-dissociative xyloglucanases ( Fig. 2 and Fig. S4 (A, B, and C)) like the previously characterized Xanthomonas citri pv. mangiferaeindicae GH74 (23), which also belongs to the phylogenetic Group 1. Notably, this group also contains T. maritima Cel74, which was previously shown to be 4 times more active on barley ␤-glucan than on tamarind XyG (19); this difference of specificity is not easily rationalized in light of the phylogenetic relationship with C. lactoaceticus GH74a (Fig. 1).
Recent reports showed that two tryptophan residues found in the ϩ3 and ϩ5 subsites in the active site cleft are necessary for the processivity of GH74 enzymes (32,35) and are conserved in all previously reported endo-processive xyloglucanases in this family (26, 28) (see below; Group 5). Very few sequences from the Group 1 enzymes possess one or both ϩ3 and ϩ5 subsite Trp residues (15 and 4%, respectively), consistent with the lack of processivity observed in our examples.

EDITORS' PICK: GH74 structure-function analysis
Remarkably, C. lactoaceticus GH74a has both positive-subsite Trp residues ( Fig. S5) but nonetheless acted as a dissociative enzyme. Thus, the presence of this pair of Trp residues is necessary but not sufficient for processivity in GH74. N. koreensis GH74, R. albus GH74a, and C. lactoaceticus GH74a all had relaxed regiospecificity and were thus able to cleave the backbone of XyG at both xylosylated (X) and unbranched glucosyl (G) units (Fig. 3). In contrast, X. citri pv. mangiferaeindicae GH74 cleaved specifically after X motifs (23). These four enzymes have a Gly residue in the Ϫ1 subsite, as do 60% of enzymes from Group 1. This residue has been shown to be responsible for the ability of previously characterized GH74 endo-xyloglucanases to cleave at X units (35,54). However, some Group 1 members have an Ala (20%), a Trp (10%), or a Gln (7%) residue in the corresponding position, suggesting that some Group 1 members may have a strict preference for XyG hydrolysis at G units.
To investigate the structural determinants for the mode of action of enzymes belonging to Group 1, we solved the tertiary structure of C. lactoaceticus GH74a in complex with the XyG fragment LLG (PDB code 6P2M), and of N. koreensis GH74 in complex with two XyG fragments, XXLG and XXXG (PDB code 6P2L; Fig. 4B). These represent the first three-dimensional structures described in Group 1. Vis-à-vis SaAFV00434 in the distantly related sister clade ( Fig. 1), these structures reveal a broad, active-site cleft poised to accept the highly branched XyG polysaccharide chain. The structure of C. lactoaceticus GH74a clearly demonstrates the positioning of consecutive Trp residues, Trp 328 and Trp 329 , comprising the ϩ3 and ϩ5 subsites (Fig. 5 and Fig. S5). Remarkably, the N. koreensis GH74 active-site cleft also harbors two Trp residues in homologous ϩ3 and ϩ5 subsite positions (Trp 328 and Trp 337 ), but instead of being found consecutively in the primary structure, they are interspersed with a loop comprising Ser 329 -Thr 336 ( Fig. 5 and Fig. S5).
Active-site aromatic residues, in particular tryptophan residues, are important for substrate recognition and processivity in glycoside hydrolases (55)(56)(57). Across the active-site cleft, we found only five hydrophobic residues positioned to interact with the XyG backbone from the Ϫ4 to the ϩ5 subsite in C. lactoaceticus GH74a (Tyr 122 , Trp 126 , Trp 328 , Trp 329 , and Trp 375 ) and N. koreensis GH74 (Tyr 117 , Phe 118 , Trp 328 , Trp 337 , and Trp 376 ) (Fig. 5). In comparison, the active-site cleft of the processive xyloglucanase P. odorifer GH74 (PDB code 6MLG) of Group 5 (see below) is lined with 12 aromatic residues (35), which create a large hydrophobic platform extending from the Ϫ4 to the ϩ6 subsites (Fig. 5). C. lactoaceticus GH74a and N. koreensis GH74 completely lack a corresponding ϩ6 subsite. Overall, these results suggest that Group 1 comprises enzymes with the first sequence features allowing for dissociative endo-xyloglucanase activity but that the limited number of hydrophobic interactions in their active cleft does not enable processivity.
Group 3-Group 3 is currently comprised of 22 bacterial enzymes belonging to the genus Streptomyces as well as one Proteobacteria enzyme. All enzymes from Group 3 carry the Trp residue in subsite ϩ3, which is found in some members of Group 1 but is ubiquitous in Groups 4 and 5. At the same time, Group 3 members lack the ϩ5 subsite Trp found in Groups 4 and 5 (Fig. 1). In addition, enzymes from Group 3 have also acquired hydrophobic residues in the Ϫ4 and Ϫ3 subsites that are conserved in the Group 5 processive xyloglucanase P. odorifer GH74 (35) (see below) (Fig. S5). Within Group 3, Streptomyces atroolivaceus GH74 acted as an endo-dissociative enzyme (Fig. S4D), analogous to the previously characterized

EDITORS' PICK: GH74 structure-function analysis
Streptomyces avertimilis GH74b (26). Both enzymes were able to cleave XyG backbone at both G and X motifs, yet with a clear preference for the unbranched G unit (Fig. 3) (26), reflective of the presence of a Gly residue in the Ϫ1 subsite.
Despite the lack of three-dimensional structural representatives from phylogenetic Group 3, sequence analysis indicates the presence of hydrophobic residues in subsites Ϫ4, Ϫ3, ϩ2, and ϩ3 in the active cleft of these enzymes (Fig. S5). As in Group 1, these, and especially the limited aromatic platform in the positive subsites, are apparently insufficient to enable processivity (Fig. S4D). The current data indicate that Group 3 members are endo-dissociative enzymes that preferentially hydrolyze the XyG backbone at unbranched glucosyl units (Fig. 3).
Group 4 -Group 4 is comprised of 19 bacterial enzymes belonging to the family Streptomycetaceae. Sequence alignment indicates that Group 4 members have retained the all active-site aromatic residues characteristic of Group 3 and additionally acquired the ϩ5 subsite Trp residue found in Group 5 members (Fig. S5). Unfortunately, instability of recombinant S. venezuelae GH74a precluded detailed enzymology.

EDITORS' PICK: GH74 structure-function analysis
Nonetheless, time-course hydrolysis of XyG analyzed by HPAEC-PAD analysis clearly indicated that this enzyme acted as an endo-dissociative enzyme (Fig. S4E) and hydrolyzed the polysaccharide backbone at both X and G units (Fig. 3). The presence of a conserved Gly in subsite Ϫ1 of all enzymes from Group 4 is consistent with this relaxed regiospecificity (Fig. S5). However, the presence of an extended positive subsite platform is insufficient to support processivity (Fig. S4E).
Group 5-Group 5 is comprised of 173 bacterial and fungal enzymes that form a monophyletic group supported by a high bootstrap value of 75. Notably, most GH74 catalytic modules of this group are appended to a carbohydrate-binding module (CBM) (16,26,35), whereas CBMs are generally absent in enzymes from other phylogenetic groups (Fig. 1).
Nearly all enzymes from Group 5 (166 of 173) contain the subsite ϩ3/ϩ5 Trp pair, which constitute an extended substrate-binding platform also observed in Group 4 (Fig. 1). This platform appears to be a prerequisite for processivity, as all presently (Fig. S4, F, G, and I-O) and previously characterized processive GH74 endo-xyloglucanases belong to Group 5 (20,24,26,28,29,35). Indeed, previous work on Group 5 members P. odorifer GH74 (35) and Paenibacillus sp. strain KM21 (29) used site-directed mutagenesis to define the critical role of both ϩ3 and ϩ5 aromatic residues in processivity. Further, R. albus GH74b is a rare instance of a natural variant in this phylogenetic group, in which the conserved ϩ5 subsite Trp has been substituted with Ala. Accordingly, R. albus GH74b is an endodissociative xyloglucanase (Fig. S4H). Thus, both Trp residues are not sufficient (as in Group 4); they are nonetheless necessary for processivity (as in Group 5).
These observations prompted us to reevaluate our previous analysis of Cellvibrio japonicus GH74, in which we described this Group 5 enzyme as endo-dissociative (16). However, the presence of the pair of ϩ3/ϩ5 subsite Trp residues (Trp 353 and Trp 354 ) in this enzyme predicts an endo-processive mode of action. A more refined time-course analysis of XyG degradation showed that the WT C. japonicus GH74 had an endo-processive mode of action, consistent with its active-site composition and placement in Group 5, whereas the subsite variants W353A and W354A acted as endo-dissociative enzymes (Fig.  S6), analogous to homologous mutants (29,35).
As in other phylogenetic groups, the residue occupying the Ϫ1 subsite in the active cleft of GH74 xyloglucanases affects the backbone cleavage regiospecificity of Group 5 enzymes, yet it is not the only determinant. The vast majority (90%) of enzymes from Group 5 have a Gly residue in subsite Ϫ1, whereas the remainder have either a Tyr, a Leu, an Ala, or an Arg residue in this position. Among this latter group, the previously characterized Phanerochaete chrysosporium Xgh74B has a Leu in the Ϫ1 subsite (28), whereas C. bescii GH74 and C. lactoaceticus GH74 have a Tyr here ( Fig. 1 and Fig. S5). These three enzymes showed a strict specificity for XyG backbone hydrolysis at unbranched G units (Fig. 3).
Among enzymes with a Gly residue in the Ϫ1 subsite, the data were more equivocal. Whereas the regiospecificities of R. albus GH74b and Paenibacillus mucilaginosus GH74 are relaxed, Paenibacillus graminis GH74 and Paenibacillus borealis GH74 were the only Group 5 enzymes that could efficiently hydrolyze XXXG to XX ϩ XG. On the other hand, Paenibacillus jamilae GH74 and Paenibacillus polymyxa GH74 showed a clear, but not exclusive, preference for cleavage at G units and a propensity to hydrolyze XXXG. Last, Streptomyces rapamycinicus GH74 and S. venezuelae GH74 strictly cleave XyG backbone at the unbranched glucosyl unit (Fig. 3).
To further investigate the determinants for the cleavage pattern of GH74 enzymes, we used P. odorifer GH74 (35) as a platform for site-directed mutagenesis. This enzyme shares over 90% sequence identity with P. graminis GH74 and P. borealis GH74 and likewise hydrolyzes XXXG to XX ϩ XG (Fig. 3). P. odorifer GH74 has a mobile loop (Asn 642 -Ala 651 ) that is conserved in P. graminis GH74 and P. borealis GH74 (Fig. S5). In the closed conformation, this loop protrudes into the active site, covering subsite Ϫ4 and hindering subsite Ϫ3 (35). Thus, we first eliminated the possibility that this loop might force XX͉XG into a Ϫ2 to ϩ2 binding mode in these enzymes, thereby promoting hydrolysis between two X units (as indicated here with the vertical bar). Indeed, the P. odorifer GH74 deletion variant ⌬Asn 642 -Ala 651 behaved like the WT enzyme (Fig. S7).
Hence, we investigated the role of the residue found in the Ϫ1 subsite in the active site cleft of P. odorifer GH74. In a previous study, we showed that a G476Y mutation in the Ϫ1 subsite switched the mode of action to exclusively cleave the XyG backbone at the G unit (35). Analogously, here we produced three single-point mutations representing the other amino acid variants found in the Ϫ1 subsite of GH74 enzymes (viz. G476A, G476W, and G476Q). Like the G476Y mutant, G476A, G476W, and G476Q variants all showed strict specificity for XyG hydro- EDITORS' PICK: GH74 structure-function analysis lysis at the G motif (Fig. S7). Thus, even the relatively small methyl side chain of the Ala residue hinders the accommodation of a xylose side chain in the subsite Ϫ1 and shifts the register of XyG backbone hydrolysis to the canonical unbranched G unit (11,58).
A striking feature of P. odorifer GH74 was the presence of 12 aromatic residues that lined the active-site cleft of the enzyme, which formed a large hydrophobic platform that extended from the Ϫ4 to the ϩ6 subsites (35). Consistent with the conserved binding position of the xyloglucan fragments noted above, these residues are conserved in S. rapamycinicus GH74 (Fig. 5) as well as in P. graminis GH74 and nearly all enzymes from Group 5 ( Fig. S5 and File S1 (GH74_CatalyticModules_ Aligned.mfa)). In comparison, Group 1 members N. koreensis GH74 and C. lactoaceticus GH74a only have up to five of these active-site cleft aromatic residues (Fig. 5).
As might be expected, sequence analysis revealed that the acquisition of some of these key aromatic residues by Group 5 enzymes occurred through single point mutations. For instance, a Tyr residue is found in the P. graminis GH74 (Tyr 295 ) and S. rapamycinicus GH74 (Tyr 307 ) ϩ1 subsites, whereas an Asn or a Ser occupies the corresponding positions in C. lactoaceticus GH74a and N. koreensis GH74, respectively (Fig. S5). However, loop extensions have also played a major role in building the hydrophobic platform. In particular, loops Tyr 206 -Gly 215 , Gly 320 -Tyr 325 , and Gly 371 -Ala 381 provided the scaffold for the insertion of Tyr 214 , Trp 325 , and Tyr 373 in the subsites Ϫ2, ϩ5, and ϩ6 in P. graminis GH74 (Fig. 4D). These loops are conserved across members of Group 5 but are absent in other phylogenetic groups (Fig. S5). Notably, the loop composed of Gly 371 -Ala 381 added the ϩ6 subsite, which is found only in Group 5. The insertion of these aromatic residues created a network of stacking interactions with the XyG backbone that contribute to the processivity of GH74 enzymes. For example, residues Trp 406 (ϩ2 subsite) and Tyr 372 (ϩ6 subsite) con-tribute to processivity in P. odorifer GH74 (35), whereas Trp 61 (Ϫ4 subsite) and Trp 64 (Ϫ3 subsite) contribute to the processivity of Paenibacillus KM.21 XEG74 (29), beyond the essential requirement of Trp residues in subsites ϩ3 and ϩ5 in these enzymes. Most of these auxiliary aromatic residues are conserved in Group 5 enzymes but are not found in the other phylogenetic groups.

Discussion
Enzymes from the same GH family share a common structural fold and catalytic mechanism (13,53). However, many CAZyme families harbor members with diverse specificities (poly-specific families), which makes functional annotation challenging due a general lack of detailed biochemical characterization (13). For a handful of larger GH families examined to date, phylogeny-based subfamily classification has enabled further refinement of activities into monospecific clades in some cases (59 -62). Thus, phylogenies highlight different structural trajectories within GH families that correlate with conserved sequence residues and substrate specificities. Not least, such delineation guides functional and structural analyses toward the characterization of enzymes significantly divergent from those previously studied and thus can resolve knowledge gaps.
Through the largest systematic experimental analysis to date, this study provides a broad overview of structure-function relationships in GH74. Enzymes from this family have evolved a unique tertiary structure comprising a large cleft to accommodate the highly branched XyG chain. From this scaffold, we observe different evolutionary trajectories that delineate the mode of action and backbone cleavage regiospecificity. Notably, GH74 is sister to a group of distantly related, dual sevenbladed ␤-propeller proteins, of which we were able to solve the first tertiary structure, but for which we were unable to find polysaccharide hydrolase activity.
Across the GH74 phylogeny, the characterized members of the diverse Group 1 generally evidence a relaxed backbone cleavage specificity, with the ability to hydrolyze at X or G units through an endo-dissociative (i.e. nonprocessive) mode of action. Although we were only able to observe strict XyG specificity in the examples we characterized, the observation that T. maritima Cel74 is 4 times more active on ␤-glucan than on XyG (19) might imply that broader specificity exists among the sequence-diverse Group 1 members. At the same time, the C. lactoaceticus GH74a in a closely related sister clade was a strict xyloglucanase (Fig. 1). Regrettably, we were unable to reproduce T. maritima Cel74 to explore this further, but certainly functional characterization of additional Group 1 members, including from completely uncharacterized major clades (Fig. 1), is warranted.
Phylogenetic Groups 3 and 4 are individually dominated by single genera or phyla and therefore may simply reflect speciation and not functional evolution. Nonetheless, characterized members of these clades possess unique constellations of active-site residues (as well as CBM modularity) (Fig. 1). In particular, the stepwise gain of key active-site aromatic residues, which are necessary for processivity in Group 5 enzymes, may suggest that these group represent extant evolutionary intermediates. However, generally low bootstrap values for many clades preclude definitive conclusions from being drawn EDITORS' PICK: GH74 structure-function analysis in this regard. Most distinctly, members of Group 5 have evolved a large hydrophobic platform of 10 subsites through a series of point mutations and loop insertions, which engender a processive mode of action.
The biological basis of the molecular selection for processivity across a wide range of Group 5 members is not immediately intuited. Processivity is generally considered to be advantageous for enzymes acting on crystalline substrates such as cellulose or chitin, where initial chain engagement is thought to be rate-limiting (42)(43)(44)(45)(46)(47). However, this would not be expected for soluble polysaccharides, such as XyG, especially under dilute assay conditions in vitro. In the plant cell wall, XyG associates with crystalline cellulose microfibrils and other matrix glycans in an amorphous, hydrated state (63)(64)(65).
Hence, we hypothesize that processivity in GH74 may be utilized in the context of substrate sensing, in which the initial, rapid release of short, highly diffusible XyG oligosaccharides acts as a signal to up-regulate the production of cognate enzymes (66,67). In contrast, classical endo-dissociative activity predominantly generates large polysaccharide fragments during early stages of attack, which would remain associated with the cell wall. Supporting this proposal, recent transcriptomics analysis revealed that the gene encoding C. japonicus GH74 (a highly efficient, secreted, processive endo-xyloglucanase (16) (Fig. S6)) is constitutively expressed at a low level and is not up-regulated in the presence of XyG (9). This regulation contrasts with other highly specific exoglycosidases (GH3, GH31, GH35, and GH95), an endo-xyloglucanase (GH5_4), and a transporter (9).
Signal peptide analysis (68) suggests that all of the GH74 enzymes in our study are extracellular, whereas many of those from Group 5 also have CBMs, which is indicative of cell wall targeting. In this context, we might speculate that processivity in GH74 enzymes is independent of XyG type (i.e. side-chain composition); processivity appears to be primarily driven by polysaccharide backbone interactions with key active-site aromatic residues, and inspection of the several crystallographic complexes now available reveals little capacity for interaction with distal side-chain residues on xylosyl branches.
Although they are generally associated with saprotrophic organisms (16, 21, 22, 26, 30 -33, 35), the extent to which GH74 enzymes might play a role in beneficial plant-microbe interactions remains to be studied (5). We also note that microbial processive glycanases operating on amorphous polysaccharides have been identified among several GH families (69 -74), which may also imply a wider deployment of "sensing" (67) enzymes than generally appreciated.
The resulting 342 nonredundant sequences were screened for the presence of a signal peptide using SignalP version 4.0 (68). Modular architecture was inferred from BLASTP analysis (76) and the CAZy database (13). The sequences were aligned with MAFFT G-INS-i (77), and the quality of the alignment was manually inspected in Jalview (78). A maximum-likelihood phylogenetic tree was estimated by RAxML version 8 (79) using File S1, GH74_CatalyticModules_Aligned.mfa as input on the CIPRES gateway (80), using 100 bootstrap replicates and S. agarivorans SA1 sequences AFV00434 and AFV00474 as an outgroup. The resulting phylogeny was visualized with FigTree (http://tree.bio.ed.ac.uk/software/figtree/). 4

Cloning and site-directed mutagenesis
Cloning of target genes was performed as described previously (81).  Table S1).
The PCRs were designed such that only the GH74 catalytic module was amplified, thus removing signal peptides and other modules (e.g. CBMs), and the sequence was flanked by ligationindependent cloning (LIC) adaptors, following the recommendations given previously (81). LIC was performed in the vector pMCSG53 as described (81) to fuse the recombinant proteins with a N-terminal His 6 tag, with a tobacco etch virus protease cleavage site. Alternatively, LIC was performed in the vector pMCSG-GST or pMCSG69 to fuse the recombinant proteins with an N-terminal GST-His 6 tag or an N-terminal MBP-His 6 tag, respectively (see Table S1).

Gene expression and protein purification
Constructs were individually transformed into chemically competent E. coli BL21 DE3 cells. Colonies were grown on lysogeny broth solid medium supplemented with ampicillin (100 g/ml). Isolated colonies of the transformed E. coli cells were inoculated in lysogeny broth medium containing ampicillin (100 g/ml) and grown overnight at 37°C with rotary shaking at 200 rpm. Precultures were used to inoculate ZYP5052 autoinducing medium (82) containing ampicillin (100 g/ml). Cultures were grown at 37°C for 4.5 h and transferred at 16°C for overnight incubation with rotary shaking at 200 rpm until reaching an A 600 nm of approximately 11. Cultures were then centrifuged at 4500 ϫ g for 30 min, and pellets were resuspended in 50 mM sodium phosphate buffer, pH 7.4, 500 mM NaCl, 20 mM imidazole, and the suspension was frozen at Ϫ20°C. Frozen cells were thawed and lysed by the addition of lysozyme (0.5 mg/ml) and benzonase (25 units) followed by incubation at 37°C for 1 h. In addition, cells were disrupted by sonication, and the cell-free extract was separated by centrifugation at 4°C (14,500 ϫ g for 45 min).
Recombinant proteins were purified from the cell-free extract with an Akta Purifier FPLC system using a Ni 2ϩ affinity column. A gradient up to 100% elution buffer (50 mM sodium phosphate buffer, pH 7.4, 500 mM NaCl, 500 mM imidazole) was applied. The purity of the recombinant proteins was determined by SDS-PAGE and staining with Coomassie Brilliant Blue. Pure fractions were pooled, concentrated, and buffer-exchanged against 50 mM sodium phosphate buffer, pH 7.0. Removal of the GST tag for RaGH74a and the MBP tag for ClGH74b and CbGH74 was performed overnight at 4°C using 1 mg of tobacco etch virus protease per 50 mg of recombinant protein. Untagged proteins were purified using a Ni 2ϩ affinity column as described above. The final purification step was performed on a size-exclusion Superdex 200 column eluted with 50 mM sodium phosphate buffer, pH 7.0. Protein concentration was estimated using the Epoch Micro-Volume

Carbohydrate analytics
HPAEC-PAD and MALDI-TOF MS were performed exactly as described previously (35).

Enzyme kinetics and product analysis
For all enzyme assays on polysaccharides, the activity was determined using the BCA assay as described previously (84). Substrate specificity was determined in 50 mM sodium phosphate buffer, pH 7.0, using 0.5 mg/ml substrate and 1 g/ml enzyme overnight at 37°C. The optimum pH was established in 50 mM citrate buffer, pH 3.0, 4.0, 5.0, 5.5, and 6.0, or 50 mM sodium phosphate buffer, pH 6.0, 6.5, 7.0, and 8.0. The optimum temperature was determined in a 50 mM concentration of the optimum buffer (citrate or phosphate at the optimum pH; see Fig. S1), using tamarind seed XyG at a concentration of 0.5 mg/ml and appropriate concentration of recombinant protein (typically around 0.5 g/ml) at temperatures ranging from 25 to 98°C.
To determine Michaelis-Menten parameters of recombinant proteins for XyG, different concentrations of substrate solutions were used over the range 0.02-2 mg/ml. The reactions were performed at 37°C (or 65°C for thermostable enzymes ClGH74a, ClGH74b, and CbGH74 or 20°C for SvGH74a) in a 50 mM concentration of their optimum buffer (citrate or phosphate at the optimum pH; see Fig. S1), using typically 0.1 g/ml enzyme.
To determine the products released by recombinant GH74 enzymes, tamarind seed XyG was incubated at 37°C (or 65°C for ClGH74a, ClGH74b, and CbGH74) in a 50 mM concentration of their optimum buffer (citrate or phosphate at the optimum pH; see Fig. S1) at a concentration of 0.5 mg/ml in the presence of 0.1 g/ml enzyme (or 1 g/ml for SvGH74a). After various incubation times (0, 5, 10, 30, and 60 min), 100 l of the reaction were sampled and transferred into 100 l of boiling water for 15 min. The reaction solution was then analyzed by HPAEC-PAD. Limit digestion products were obtained similarly after 72 h using 10 g/ml enzyme (or 100 g/ml for SvGH74a) and 0.1 mg/ml tamarind seed XyG. Limit digestion products of XXXGXXXG and of XXXG were obtained similarly after overnight incubation of 5 M substrate with 1 g/ml enzyme (or 10 g/ml for SvGH74a).
X-ray diffraction data were collected at beamline 19-ID/BM of the Structural Biology Center, Advanced Photon Source, Argonne National Laboratory (Argonne, IL) (for PgGH74 SelMet, SrGH74 native, and AFV00434 SelMet and native), beamline 08-ID at the Canadian Macromolecular Crystallography Facility, Canadian Light Source (Saskatoon, Saskatchewan, Canada) (for native NkGH74), or on a Rigaku HF-007 home source with an R-AXIS IV detector (for native ClGH74a). Data for PgGH74 and AFV00434 SelMet crystals were collected at the selenomethionine absorption peak wavelength. X-ray diffraction data were reduced using HKL-3000 (85).
The structure of AFV00434 SelMet was solved using SAD phasing using Phenix.solve (86) and Phenix.autobuild; subsequent refinement was completed using higher-resolution crystals of AFV00434 native protein using this initial model. The structure of PgGH74 was also solved using SAD phasing and Phenix.solve. The structures of NkGH74, SrGH74, and ClGH74 were solved by Molecular Replacement and Phenix.phaser using models constructed by the Phyre2 server (87) onto PoGH74 (PDB code 6MGL), a putative xyloglucanase from Streptomyces sp. SirexAA-E (PDB code 5JWZ), and C. japonicus GH74 (PDB code 5FKQ), respectively.
Phenix.autobuild, Phenix.refine, and Coot (88) were used for refinement and model building. The presence of xyloglucan was readily apparent in F o Ϫ F c maps after resolving the positions of the protein atoms. All B-factors were refined, and TLS parameterization was included in the final rounds of refinement. All geometry was verified using the Phenix and the wwPDB server, and structures were deposited to the Protein Data bank with accession numbers 6P2K, 6P2M, 6P2L, 6P2N, and 6P2O for S. agarivorans AFV00434, C. lactoaceticus GH74a in complex with the XyG fragment LLG, N. koreensis GH74 in complex with two XyG fragments (XXLG and XXXG), and P. graminis GH74 and S. rapamycinicus GH74 in complex with two XyG fragments (XLLG and XXXG), respectively. All X-ray crystallographic statistics are provided in Table S3.
Author contributions-G. A. performed sequence and phylogenetic analysis; cloned, produced, and purified GH74 catalytic modules and site-directed mutants; performed biochemical characterization; produced and purified XyG oligosaccharides XXXGXXXG and XXXG; generated figures; and co-wrote the manuscript. P. S. solved all crystal structures, produced structure figures and data, and co-wrote the manuscript. J. A. cloned, expressed, and purified GH74 catalytic