The Mechanism by Which Arabinoxylanases Can Recognize Highly Decorated Xylans

The enzymatic degradation of plant cell walls is an important biological process of increasing environmental and industrial significance. Xylan, a major component of the plant cell wall, consists of a backbone of β-1,4-xylose (Xylp) units that are often decorated with arabinofuranose (Araf) side chains. A large penta-modular enzyme, CtXyl5A, was shown previously to specifically target arabinoxylans. The mechanism of substrate recognition displayed by the enzyme, however, remains unclear. Here we report the crystal structure of the arabinoxylanase and the enzyme in complex with ligands. The data showed that four of the protein modules adopt a rigid structure, which stabilizes the catalytic domain. The C-terminal non-catalytic carbohydrate binding module could not be observed in the crystal structure, suggesting positional flexibility. The structure of the enzyme in complex with Xylp-β-1,4-Xylp-β-1,4-Xylp-[α-1,3-Araf]-β-1,4-Xylp showed that the Araf decoration linked O3 to the xylose in the active site is located in the pocket (−2* subsite) that abuts onto the catalytic center. The −2* subsite can also bind to Xylp and Arap, explaining why the enzyme can utilize xylose and arabinose as specificity determinants. Alanine substitution of Glu68, Tyr92, or Asn139, which interact with arabinose and xylose side chains at the −2* subsite, abrogates catalytic activity. Distal to the active site, the xylan backbone makes limited apolar contacts with the enzyme, and the hydroxyls are solvent-exposed. This explains why CtXyl5A is capable of hydrolyzing xylans that are extensively decorated and that are recalcitrant to classic endo-xylanase attack.

The plant cell wall is an important biological substrate. This complex composite structure is depolymerized by microorganisms that occupy important highly competitive ecological niches, whereas the process makes an important contribution to the carbon cycle (1). Lignocellulosic degradation is also of continued interest to environmentally sensitive industries such as the biofuels and biorefinery sectors, where the use of sustainable or renewable substrates is of increasing importance. Given that the plant cell wall is the most abundant source of renewable organic carbon on the planet, this macromolecular substrate has substantial industrial potential (2).
An example of the chemical complexity of the plant cell wall is provided by xylan, which is the major hemicellulosic component. This polysaccharide comprises a backbone of ␤-1,4-Dxylose residues in their pyranose configuration (Xylp) that are decorated at O 2 with 4-O-methyl-D-glucuronic acid (GlcA) and at O 2 and/or O 3 with ␣-L-arabinofuranose (Araf) residues, whereas the polysaccharide can also be extensively acetylated (3). In addition, the Araf side chain decorations can also be esterified to ferulic acid that, in some species, provide a chemical link between hemicellulose and lignin (3). The precise structure of xylans varies between plant species, in particular in different tissues and during cellular differentiation (4). In specialized plant tissues, such as the outer layer of cereal grains, xylans are extremely complex, and side chains may comprise a range of other sugars including L-and D-galactose and ␤and ␣-Xylp units. Indeed, in these cereal brans, xylans have very few backbone Xylp units that are undecorated, and the side chains can contain up to six sugars (5).
Reflecting the chemical and physical complexity of the plant cell wall, microorganisms that utilize these composite structures express a large number of polysaccharide-degrading enzymes, primarily glycoside hydrolases, but also polysaccharide lyases, carbohydrate esterases, and lytic polysaccharide monooxygenases. These carbohydrate active enzymes are grouped into sequence-based families in the CAZy database (6). With respect to xylan degradation, the backbone of simple xylans is hydrolyzed by endo-acting xylanases, the majority of which are located in glycoside hydrolase (GH) 5 families GH10 and GH11, although they are also present in GH8 (1,7). The extensive decoration of the xylan backbone generally restricts the capacity of these enzymes to attack the polysaccharide prior to removal of the side chains by a range of ␣-glucuronidases, ␣-arabinofuranosidases, and esterases (8). Two xylanases, however, utilize the side chains as essential specificity determinants and thus target decorated forms of the hemicellulose. The GH30 glucuronoxylanases require the Xylp bound at the Ϫ2 to contain a GlcA side chain (9) (the scissile bond targeted by glycoside hydrolases is between subsites Ϫ1 and ϩ1, and subsites that extend toward the non-reducing and reducing ends of the substrate are assigned increasing negative and positive numbers, respectively (10)). The GH5 arabinoxylanase (CtXyl5A) derived from Clostridium thermocellum displays an absolute requirement for xylans that contain Araf side chains (11). In this enzyme, the key specificity determinant is the Araf appended to O 3 of the Xylp bound in the active site (Ϫ1 subsite). The reaction products generated from arabinoxylans, however, suggest that Araf can be accommodated at subsites distal to the active site.
CtXyl5A is a multimodular enzyme containing, in addition to the GH5 catalytic module (CtGH5); three non-catalytic carbohydrate binding modules (CBMs) belonging to families 6 (CtCBM6), 13 (CtCBM13), and 62 (CtCBM62); fibronectin type 3 (Fn3) domain; and a C-terminal dockerin domain Fig. 1. Previous studies of Fn3 domains have indicated that they might function as ligand-binding modules, as a compact form of peptide linkers or spacers between other domains, as cellulosedisrupting modules, or as proteins that help large enzyme complexes remain soluble (12). The dockerin domain recruits the enzyme into the cellulosome, a multienzyme plant cell wall degrading complex presented on the surface of C. thermocellum (13,14). CtCBM6 stabilizes CtGH5 (11), and CtCBM62 binds to D-galactopyranose and L-arabinopyranose (15). The function of the CtCBM13 and Fn3 modules remains unclear. Similarly, the mechanism of substrate recognition and its impact on specificity are key unresolved issues. This report exploits the crystal structure of mature CtXyl5A lacking its C-terminal dockerin domain (CtXyl5A -Doc ), and the enzyme in complex with ligands, to explore the mechanism of substrate specificity. The data show that the plasticity in substrate recognition enables the enzyme to hydrolyze highly complex xylans that are not accessible to classical GH10 and GH11 endo-xylanases.

Results
Substrate Specificity of CtXyl5A-Previous studies showed that CtXyl5A is an arabinoxylan-specific xylanase that generates xylooligosaccharides with an arabinose linked O 3 to the reducing end xylose (11). The enzyme is active against both wheat and rye arabinoxylans (abbreviated as WAX and RAX, respectively). It was proposed that arabinose decorations make productive interactions with a pocket (Ϫ2*) that is abutted onto the active site or Ϫ1 subsite. Arabinose side chains of the other backbone xylose units in the oligosaccharides generated by CtXyl5A were essentially random. These data suggest that O 3 , and possibly O 2 , on the xylose residues at subsites distal to the active site and Ϫ2* pocket are solvent-exposed, implying that the enzyme can access highly decorated xylans. To test this hypothesis, the activity of CtXyl5A against xylans from cereal brans was assessed. CtXyl5a was incubated with a range of xylans for 16 h at 60°C, and the limit products were visualized by TLC. These xylans are highly decorated not only with Araf and GlcA units but also with L-Gal, D-Gal, and D-Xyl (5). Indeed, very few xylose units in the backbone of bran xylans lack side chains. The data presented in Table 1 showed that CtXyl5A was active against corn bran xylan (CX). In contrast typical endoxylanases from GH10 and GH11 were unable to attack CX, reflecting the lack of undecorated xylose units in the backbone (the active site of these enzymes can only bind to non-substituted xylose residues (8,16)). The limit products generated by CtXyl5A from CX consisted of an extensive range of oligosaccharides. These data support the view that in subsites out with the active site the O 2 and O 3 groups of the bound xylose units are solvent-exposed and will thus tolerate decoration.
To explore whether substrate bound only at Ϫ2* and Ϫ1 in the negative subsites was hydrolyzed by CtXyl5A, the limit products of CX digested by the arabinoxylanase were subjected to size exclusion chromatography using a Bio-Gel P-2, and the smallest oligosaccharides (largest elution volume) were chosen for further study. HPAEC analysis of the smallest oligosaccha- 5  ride fraction (pool 4) contained two species with retention times of 14.0 min (oligosaccharide 1) and 20.8 min (oligosaccharide 2) (Fig. 2). Positive mode electrospray mass spectrometry showed that pool 4 contained exclusively molecular ions with a m/z ϭ 305 [M ϩ Na] ϩ , which corresponds to a pentosepentose disaccharide (molecular mass ϭ 282 Da) as a sodium ion adduct, whereas a dimer of the disaccharide with a sodium adduct (m/z ϭ 587 [2MϩNa] ϩ ) was also evident. The monosaccharide composition of pool 4 determined by TFA hydrolysis contained xylose and arabinose in a 3:1 ratio. This suggests that the two oligosaccharides consist of two disaccharides: one consisting of two xylose residues and the other consisting of an arabinose linked to a xylose. Treatment of pool 4 with the nonspecific arabinofuranosidase, CjAbf51A (17), resulted in the loss of oligosaccharide 2 and the production of both xylose and arabinose, indicative of a disaccharide of xylose and arabinose. Incubation of pool 4 with a ␤-1,3-xylosidase (XynB) converted oligosaccharide 1 into xylose, demonstrating that this molecule is the disaccharide ␤-1,3-xylobiose. This view is supported by the inability of a ␤-1,4-specific xylosidase to hydrolyze oligosaccharide 1 or oligosaccharide 2 (data not shown). The crucial importance of occupancy of the Ϫ2* pocket for catalytic competence is illustrated by the inability of the enzyme to hydrolyze linear ␤-1,4-xylooligosaccharides. The generation of Araf-Xylp and Xyl-␤-1,3-Xyl as reaction products demonstrates that occupancy of the Ϫ2 subsite is not essential for catalytic activity, which is in contrast to all endo-acting xylanases where this subsite plays a critical role in enzyme activity (18,19). Indeed, the data demonstrate that Ϫ2* plays a more important role in productive substrate binding than the Ϫ2 subsite. Unfortunately, the inability to generate highly purified (Xyl-␤-1,4) n -[␤-1,3-Xyl/Ara]-Xyl oligosaccharides from arabinoxylans prevented the precise binding energies at the negative subsites to be determined. Crystal Structure of the Catalytic Module of CtXyl5A in Complex with Ligands-To understand the structural basis for the biochemical properties of CtXyl5A, the crystal structure of the enzyme with ligands that occupy the substrate binding cleft and the critical Ϫ2* subsite were sought. The data presented in Fig.  3A show the structure of the CtXyl5A derivative CtGH5-CtCBM6 in complex with arabinose bound in the Ϫ2* pocket. Interestingly, the bound arabinose was in the pyranose conformation rather than in its furanose form found in arabinoxylans. O 1 was facing toward the active site Ϫ1 subsite, indicative of the bound arabinose being in the right orientation to be linked to the xylan backbone via an ␣-1,3 linkage. As discussed on below, the axial O 4 of the Arap did not interact with the Ϫ2* subsite, suggesting that the pocket might be capable of binding a xylose molecule. Indeed, soaking apo crystals with xylose showed that the pentose sugar also bound in the Ϫ2* subsite in its pyranose conformation (Fig. 3B). These crystal structures support the biochemical data presented above showing that the enzyme generated ␤-1,3-xylobiose from CX, which would require the disaccharide to bind at the Ϫ1 and Ϫ2* subsites. A third product complex was generated by co-crystallizing the nucleophile inactive mutant CtGH5 E279S -CtCBM6 with a WAX-derived oligosaccharide (Fig. 3C). The data revealed a pentasaccharide bound to the enzyme, comprising ␤-1,4-xylotetraose with an Araf linked ␣-1,3 to the reducing end xylose. The xylotetraose was positioned in subsites Ϫ1 to Ϫ4 and the Araf in the Ϫ2* pocket. Analysis of the three structures showed that O 1 , O 2 , O 3 , and the endocyclic oxygen occupied identical positions in the Arap, Araf, and Xylp ligands bound in the Ϫ2* subsite and thus made identical interactions with the pocket. O 1 makes a polar FIGURE 2. Identification of the disaccharide reaction products generated from CX. The smallest reaction products were purified by size exclusion chromatography and analyzed by HPAEC (A) and positive mode ESI-MS (B), respectively. The samples were treated with a nonspecific arabinofuranosidase (CjAbf51A) and a GH3 xylosidase (XynB) that targeted ␤-1,3-xylosidic bonds. X, xylose; A, arabinose. The m/z ϭ 305 species denotes a pentose disaccharide as a sodium adduct [M ϩ Na] ϩ , whereas the m/z ϭ 587 signal corresponds to an ESI-MS dimer of the pentose disaccharide also as a sodium adduct [2M ϩ Na] ϩ .
Wild The importance of the interactions between the ligands and the side chains of the residues in the Ϫ2* pocket were evaluated by alanine substitution of these amino acids. The mutants E68A, Y92A, and N139A were all inactive (Table 1), demonstrating the importance of the interactions of these residues with the substrate and reinforcing the critical role the Ϫ2* subsite plays in the activity of the enzyme. N135A retained wild type activity because the O 2 of the sugars interacts with the backbone N of Asn 135 and not with the side chain. Because the hydroxyls of Xylp or Araf in the Ϫ2* pocket are not solventexposed, the active site of the arabinoxylanase can only bind to xylose residues that contain a single xylose or arabinose O 3 decoration. This may explain why the k cat /K m for CtXyl5A against WAX was 2-fold higher than against CX (Table 1). WAX is likely to have a higher concentration of single Araf decorations compared with CX and thus contain more substrate available to the arabinoxylanase.
In the active site of CtXyl5A the ␣-D-Xylp, which is in its relaxed 4 C 1 conformation, makes the following interactions with the enzyme (Fig. 4 The Xylp in the active site makes strong parallel apolar interactions with Phe 310 . Substrate recognition in the active site is conserved between CtXyl5A and the closest GH5 structural homolog, the endoglucanase BaCel5A (PDB code 1qi2) as noted previously (11).
The capacity of CtXyl5A to act on the highly decorated xylan CX indicates that O 3 and possibly O 2 of the backbone Xylp units are solvent-exposed. This is consistent with the interaction of the xylotetraose backbone with the enzyme distal to the active site. A surface representation of the enzyme (Fig. 4B) shows that O 3 and O 2 of xylose units at subsites Ϫ2 to Ϫ4 are solvent-exposed and are thus available for decoration. Indeed, these pyranose sugars make very weak apolar interactions with the arabinoxylanase. At Ϫ2, Xylp makes planar apolar interactions with the Araf bound to the Ϫ2* subsite (Fig. 4C). Xylp at subsites Ϫ2 and Ϫ3, respectively, make weak hydrophobic contact with Val 318 , the Ϫ3 Xylp makes planar apolar interactions with Ala 137 , whereas the xylose at Ϫ4 forms parallel apolar contacts with Trp 69 . Comparison of the distal negative subsites of CtXyl5A with BaCel5A and a typical GH10 xylanase (CmXyn10B, PDB code 1uqy) highlights the paucity of interactions between the arabinoxylanase and its substrate out with the active site (Fig. 4). Thus, the cellulase contains three negative subsites and the sugars bound in the Ϫ2 and Ϫ3 subsites make a total of 9 polar interactions with the enzyme (Fig. 4, D  and E). The GH10 xylanase also contains a Ϫ2 subsite that, similar to the cellulase, makes numerous interactions with the substrate (Fig. 4, F and G).
The Influence of the Modular Architecture of CtXyl5A on Catalytic Activity-CtXyl5A, in addition to its catalytic module, contains three CBMs (CtCBM6, CtCBM13, and CtCBM62) and a fibronectin domain (CtFn3). A previous study showed that although the CBM6 bound in an exo-mode to xylo-and cellulooligosaccharides, the primary role of this module was to stabilize the structure of the GH5 catalytic module (11). To explore the contribution of the other non-catalytic modules to CtXyl5A function, the activity of a series of truncated derivatives of the arabinoxylanase were assessed. The data in Table 1 show that removal of CtCBM62 caused a modest increase in activity against both WAX and CX, whereas deletion of the Fn3 domain had no further impact on catalytic performance. Truncation of CtCBM13, however, caused a 4 -5-fold reduction in activity against both substrates. Members of CBM13 have been shown to bind to xylans, mannose, and galactose residues in complex glycans (20 -23), hinting that the function of CtCBM13 is to increase the proximity of substrate to the catalytic module of CtXyl5A. Binding studies, however, showed that CtCBM13 displayed no affinity for a range of relevant glycans including WAX, CX, xylose, mannose, galactose, and birchwood xylan (BX) (data not shown). It would appear, therefore, that CtCBM13 makes a structural contribution to the function of CtXyl5A.
Crystal Structure of CtXyl5A -D -To explore further the role of the non-catalytic modules in CtXyl5A the crystal structure of CtXyl5A extending from CtGH5 to CtCBM62 was sought. To obtain a construct that could potentially be crystallized, the protein was generated without the C-terminal dockerin domain because it is known to be unstable and prone to cleavage. Using this construct (CtXyl5A -D ) the crystal structure of the arabinoxylanase was determined by molecular replacement to a resolution of 2.64 Å with R work and R free at 23.7% and 27.8%, respectively. The structure comprises a continuous polypeptide extending from Ala 36 to Trp 742 displaying four modules GH5-CBM6-CBM13-Fn3. Although there was some electron density for CtCBM62, it was not sufficient to confidently build the module (Fig. 5). Further investigation of the crystal packing revealed a large solvent channel adjacent to the area the CBM62 occupies. We postulate that the reason for the poor electron density is due to the CtCBM62 being mobile compared with the rest of the protein. The structures of CtGH5 and CtCBM6 have been described previously (11,15).
The Fn3 module displays a typical ␤-sandwich fold with the two sheets comprising, primarily, three antiparallel strands in the order ␤1-␤2-␤5 in ␤-sheet 1 and ␤4-␤3-␤6 in ␤-sheet 2. Although ␤-sheet 2 presents a cleft-like topology, typical of endo-binding CBMs, the surface lacks aromatic residues that play a key role in ligand recognition, and in the context of the full-length enzyme, the cleft abuts into CtCBM13 and thus  OCTOBER 14, 2016 • VOLUME 291 • NUMBER 42

JOURNAL OF BIOLOGICAL CHEMISTRY 22153
would not be able to accommodate an extended polysaccharide chain (see below).
In the structure of CtXyl5A -D , the four modules form a threeleaf clover-like structure (Fig. 5). Between the interfaces of CtGH5-CBM6-CBM13 there are a number of interactions that maintain the modules in a fixed position relative to each other. The interaction of CtGH5 and CtCBM6, which buries a substantial apolar solvent-exposed surface of the two modules, has been described previously (11). The polar interactions between these two modules comprise 14 hydrogen bonds and 5 salt bridges. The apolar and polar interactions between these two modules likely explaining why they do not fold independently compared with other glycoside hydrolases that contain CBMs (24,25). CtCBM13 acts as the central domain, which interacts with CtGH5, CtCBM6, and CtFn3 via 2, 5, and 4 hydrogen bonds, respectively, burying a surface area of ϳ450, 350, and 500 Å 2 , respectively, to form a compact heterotetramer. With respect to the CtCBM6-CBM13 interface, the linker (SPISTGTIP) between the two modules, extending from Ser 514 to Pro 522 , adopts a fixed conformation. Such sequences are normally extremely flexible (26); however, the two Ile residues make extensive apolar contacts within the linker and with the two CBMs, leading to conformational stabilization. The interactions between CtGH5 and the two CBMs, which are mediated by the tip of the loop between ␤-7 and ␣-7 (loop 7) of CtGH5, not only stabilize the trimodular clover-like structure but also make a contribution to catalytic function. Central to the interactions between the three modules is Trp 285 , which is intercalated between the two CBMs. The N⑀ of this aromatic residue makes hydrogen bonds with the backbone carbonyl of Val 615 and Gly 616 in CtCBM13, and the indole ring makes several apo-lar contacts with CtCBM6 (Pro 440 , Phe 489 , Gly 491 , and Ala 492 ) (Fig. 5). Indeed, loop 7 is completely disordered in the truncated derivative of CtXyl5A comprising CtGH5 and CtCBM6, demonstrating that the interactions with CtCBM13 stabilize the conformation of this loop. Although the tip of loop 7 does not directly contribute to the topology of the active site, it is only ϳ12 Å from the catalytic nucleophile Glu 279 . Thus, any perturbation of the loop (through the removal of CtCBM13) is likely to influence the electrostatic and apolar environment of the catalytic apparatus, which could explain the reduction in activity associated with the deletion of CtCBM13.
Similar to the interactions between CtCBM6 and CtCBM13, there are extensive hydrophobic interactions between CtCBM13 and CtFn3, resulting in very little flexibility between these modules. As stated above, the absence of CtCBM62 in the structure suggests that the module can adopt multiple positions with respect to the rest of the protein. The CtCBM62, by binding to its ligands (D-Galp and L-Arap) in plant cell walls (15), may be able to recruit the enzyme onto its target substrate. Xylans are not generally thought to contain such sugars. D-Galp, however, has been detected in xylans in the outer layer of cereal grains and in eucalyptus trees (5), which are substrates used by CtXyl5A. Thus, CtCBM62 may direct the enzyme to particularly complex xylans containing D-Galp at the non-reducing termini of the side chains, consistent with the open substrate binding cleft of the arabinoxylanase that is optimized to bind highly decorated forms of the hemicellulose. In general CBMs have little influence on enzyme activity against soluble substrates but have a significant impact on glycans within plant cell walls (27,28). Thus, the role of CBM62 will likely only be evident against insoluble composite substrates. Exploring GH5 Subfamily 34 -CtXyl5A is a member of a seven-protein subfamily of GH5, GH5_34 (29). Four of these proteins are distinct, whereas the other three members are essentially identical (derived from different strains of C. thermocellum). To investigate further the substrate specificity within this subfamily, recombinant forms of three members of GH5_34 that were distinct from CtXyl5A were generated. AcGH5 has a similar molecular architecture to CtXyl5A with the exception of an additional carbohydrate esterase family 6 module at the C terminus (Fig. 1). The GH5_34 from Verrucomicrobiae bacterium, VbGH5, contains the GH5-CBM6-CBM13 core structure, but the C-terminal Fn3-CBM62-dockerin modules, present in CtXyl5A, are replaced with a Laminin_3_G domain, which, by analogy to homologous domains in other proteins that have affinity for carbohydrates (30), may display a glycan binding function. The Verrucomicobiae enzyme also has an N-terminal GH43 subfamily 10 (GH43_10) catalytic module. The fungal GH5_34, GpGH5, unlike the two bacterial homologs, comprises a single GH5 catalytic module lacking all of the other accessory modules (Fig. 1). GpGh5 is particularly interesting as Gonapodya prolifera is the only fungus of the several hundred fungal genomes that encodes a GH5_34 enzyme. In fact there are four potential GH5_34 sequences in the G. prolifera genome, all of which show high sequence homology to Clostridium GH5_34 sequences. G. prolifera and Clostridium occupy similar environments, suggesting that the GpGH5_34 gene was acquired from a Clostridium species, which was followed by duplication of the gene in the fungal genome. The sequence identity of the GH5_34 catalytic modules with CtXyl5A ranged from 55 to 80% (supplemental Fig. S1). All the GH5_34 enzymes were active on the arabinoxylans RAX, WAX, and CX but displayed no activity on BX (Table 1 and Fig. 6) and are thus defined as arabinoxylanases. The limit products generated by CtXyl5A, AcGH5, and GpGH5 comprised a range of oligosaccharides with some high molecular weight material. The oligosaccharides with low degrees of polymerization were absent in the VbGH5 reaction products. However, the enzyme generated a large amount of arabinose, which was not produced by the other arabinoxylanases. Given that GH43_10 is predominantly an arabinofuranosidase subfamily of GH43 (31), the arabinose generated by VbGH5 is likely mediated by the N-terminal catalytic module (see below). Kinetic analysis showed that AcGH5 displayed similar activity to CtXyl5A against both WAX and RAX and was 2-fold less active against CX. When initially measuring the activity of wild type VbGH5 against the different substrates, no clear data could be obtained, regardless of the concentration of enzyme used the reaction appeared to cease after a few minutes. We hypothesized that the N-terminal GH43_10 rapidly removed single arabinose decorations from the arabinoxylans depleting the substrate available to the arabinoxylanase, explaining why this activity was short lived. To test this hypothesis, the conserved catalytic base (Asp 45 ) of the GH43_10 module of VbGH5 was substituted with alanine, which is predicted to inactivate this catalytic module. The D45A mutant did not produce arabinose consistent with the arabinofuranosidase activity displayed by the GH43_10 module in the wild type enzyme (Fig. 6). The kinetics of the GH5_34 arabinoxylanase catalytic module was now measurable, and activities were determined to be between ϳ6and 10-fold lower than that of CtXyl5A. Interestingly, the fungal arabinoxylanase displays the highest activities against WAX and RAX, ϳ4and 6-fold higher, respectively, than CtXyl5A; however, there is very little difference in the activity between the eukaryotic and prokaryotic enzymes against CX. Attempts to express individual modules of a variety of truncations of AcGH5 and VbGH5 were unsuccessful. This may indicate that the individual modules can only fold correctly when incorporated into the full-length enzyme, demonstrating the importance of intermodule interactions to maintain the structural integrity of these enzymes.

Discussion
A characteristic feature of enzymes that attack the plant cell wall is their complex molecular architecture (1). The CBMs in these enzymes generally play a role in substrate targeting (25,28) and are appended to the catalytic modules through flexible linker sequences (26). CtXyl5A provides a rare visualization of the structure of multiple modules within a single enzyme. The central feature of these data is the structural role played by two of the CBMs, CtCBM6 and CtCBM13, in maintaining the active conformation of the catalytic module, CtGH5. The crystallographic data described here are supported by biochemical data showing either that these two modules do not bind to glycans (CtCBM13) or that the recognition of the non-reducing end of xylan or cellulose chains (CtCBM6) is unlikely to be biologically significant. It should be emphasized, however, that glycan binding and substrate targeting may only be evident in the fulllength enzyme acting on highly complex structures such as the plant cell wall, as observed recently by a CBM46 module in the Bacillus xyloglucanase/mixed linked glucanase BhCel5B (27).
CtXyl5A is a member of GH5 that contains 6644 members. These proteins have been subdivided into 51 subfamilies based on sequence similarity (29). CtXyl5A is a member of subfamily GH5_34. Here we have explored the substrate specificity of the other members of this subfamily. Despite differences in sequence identity all of the homologs were shown to be arabinoxylanases. Consistent with the conserved substrate specificity, all members of GH5_34 contained the specificity determinants Glu 68 , Tyr 92 , and Asn 139 , which make critical interactions with the xylose or arabinose in the Ϫ2* subsite, which are 1,3linked to the xylose positioned in the active site. The presence of a CBM62 in CtXyl5A and AcGH5 suggests that these enzymes target highly complex xylans that contain D-galactose in their side chains. The absence of a "non-structural" CBM in GpGH5 may indicate that this arabinoxylanase is designed to target simpler arabinoxylans present in the endosperm of cereals. Although the characterization of all members of GH5_34 suggests that this subfamily is monospecific, differences in specificity are observed in other subfamilies of GHs including GH43 (31) and GH5 (29). Thus, as new members of GH5_34 are identified from genomic sequence data and subsequently characterized, the specificity of this family may require reinterpretation.
An intriguing feature of VbGH5 is that the limited products generated by this enzymes are much larger than those produced by the other arabinoxylanases. This suggests that although arabinose decorations contribute to enzyme specificity (VbGH5 is not active on xylans lacking arabinose side chains), the enzyme requires other specificity determinants that occur less frequently in arabinoxylans. This has some resonance with a recently described GH98 xylanase that also exploits specificity determinants that occur infrequently and are only evident in highly complex xylans (e.g. CX) (5).
To conclude, this study provides the molecular basis for the specificity displayed by arabinoxylanases. Substrate specificity is dominated by the pocket that binds single arabinose or xylose side chains. The open xylan binding cleft explains why the enzyme is able to attack highly decorated forms of the hemicellulose. It is also evident that appending additional catalytic modules and CBMs onto the core components of these enzymes generates bespoke arabinoxylanases with activities optimized for specific functions. The specificities of the arabinoxylanases described here are distinct from the classical endoxylanases and thus have the potential to contribute to the toolbox of biocatalysts required by industries that exploit the plant cell wall as a sustainable substrate.

Experimental Procedures
Cloning, Expression, and Purification of Components of CtXyl5A-All recombinant forms of CtXyl5A used in this study were expressed in the cytoplasm of Escherichia coli because they lacked a signal peptide. DNA encoding CtGH5-CtCBM6 and CtXyl5A -D (CtXyl5A lacking the C-terminal dockerin domain (CtGH5-CtCBM6-CtCBM13-Fn3-CtCBM62)) were described previously (11). DNA encoding CtGH5-CtCBM6-CtCBM13-Fn3 and CtGH5-CtCBM6-CtCBM13 and mature Acetivibrio cellulolyticus GH5 (AcGH5) were amplified by PCR using plasmid encoding the full-length C. thermocellum arabinoxylanase or A. cellulolyticus genomic DNA as the respective templates. DNA encoding the G. prolifera GH5 (GpGH5) and V. bacterium GH5 (VbGH5) were initially generated by GeneArt gene synthesis (Thermo Fisher Scientific). DNA encoding VbGH5 lacking the C-terminal cell surface anchoring residues was also amplified by PCR using the synthesized nucleic acid as the template. All the primers used in the PCRs required restriction sites and plasmids used are listed inj supplemental Table S1. All constructs were cloned such that the encoded proteins contain a C-terminal His 6 tag. Site-directed mutagenesis was carried out using the PCR-based QuikChange method (Stratagene) deploying the primers listed in supplemental Table S1.
To express the recombinant proteins, E. coli strain BL21(DE3), harboring appropriate recombinant plasmids, was cultured to mid-exponential phase in Luria broth at 37°C. Isopropyl ␤-D-galactopyranoside at 1 mM was then added to induce recombinant gene expression, and the culture incubated for a further 18 h at 16°C. The recombinant proteins were purified to Ͼ90% electrophoretic purity by immobilized metal ion affinity chromatography using Talon TM (Clontech), cobaltbased matrix, and elution with 100 mM imidazole, as described previously (33). When preparing the selenomethionine derivative of CtXyl5A -D for crystallography, the proteins were expressed in E. coli B834 (DE3), a methionine auxotroph, cultured in medium comprising 1 liter of SelenoMet Medium Base TM , 50 ml of SelenoMet TM nutrient mix (Molecular Dimensions), and 4 ml of a 10 mg/ml solution of L-selenomethionine. Recombinant gene expression and protein purification were as described above except that all purification buffers were supplemented with 10 mM ␤-mercaptoethanol.
Enzyme Assays-CtXyl5A -D and its derivatives were assayed for enzyme activity using the method of Miller (34) to detect the release of reducing sugar. The standard assay was carried out in 50 mM sodium phosphate buffer, pH 7.0, containing 0.1 mg/ml BSA and at substrate concentrations ranging from 1 to 6 mg/ml. The pH and temperature optima were previously determined to be 7 and 60°C, respectively, for the CtXyl5A -D and its derivatives. The optimum temperature for the other enzymes was found to be 37°C, and pH optima of 5, 7, and 4 were determined for AcGH5, GpGH5 and VbGH5, respectively. All enzymes were assayed for activity at their individual temperature and pH optimum. A FLUOstar Omega microplate reader (BMG Labtech) was used to measure activity in 96-well plates. Overnight assays to assess end point products were carried out with 6 mg/ml substrate and 1 M enzyme concentrations. The iden-tification of potential reaction products was also assessed by HPAEC or TLC using methodology described previously (34).
Oligosaccharide Analysis-Approximately 5 g of CX or WAX were digested to completion (no further increase in reducing sugar and change in the HPAEC product profile) with 3 M of CtXyl5A -D at 60°C for 48 h. The oligosaccharide products were purified by size exclusion chromatography using a Bio-Gel P2 column as described previously (35). The structures of the oligosaccharides were analyzed by positive ion-mode infusion/ offline electrospray ionization (ESI)-MS following either dilution with 30% acetonitrile or via desalting as described previously (36) Crystallography-Purified SeMet CtXyl5A -D was concentrated and stored in 5 mM DTT, 2 mM CaCl 2 . Crystals of seleno-L-methionine-containing protein were obtained by hanging drop vapor diffusion in 40% (v/v) 2-methyl-2,4-pentandiol. The data were collected on Beamlines ID14-1 and ID14-4 at the European Synchrotron Radiation Facility (Grenoble, France) to a resolution of 2.64 Å. The data were processed using the programs iMOSFLM (37) and SCALA (38) from the CCP4 suite (Collaborative Computational Project, Number 4, 1994). The crystal belongs to the orthorhombic space group (P2 1 2 1 2) (39). The structure was solved by molecular replacement using independently solved structures of some of the modules of the CtXyl5A: CtGH5-CBM6 (PDB code 2y8k) (11), Fn3 (PDB code 3mpc) (12), and CtCBM62 (PDB codes 2y8m, 2yfz, and 2y9s) (15) using PHASER (41). The CtCBM13 domain was built de novo. BUCCANEER (42) and PHENIX (43) were initially used for auto building. The structure was completed by iterative cycles of manual rebuilding in COOT (44) in tandem with refinement with RefMac5 (45). The final values for R work and To obtain structures of CtGH5-CBM6 in complex with ligand the protein was crystallized using the sitting drop vapor phase diffusion method with an equal volume (100 nl) of protein and reservoir solution (unless otherwise stated), using the robotic nanodrop dispensing systems (mosquito R LCP; TTPLabTech). Crystals of the protein (10 mg/ml) co-crystallized with arabinose (300 mM) were obtained in 1 M ammonium sulfate, 0.1 M Bis-Tris, pH 5.5, and 1% PEG 3350. Crystals with xylose (300 mM) grew in 100 mM sodium/potassium phosphate, 100 mM MES, pH 6.5, and 2 M sodium chloride. To obtain crystals of the arabinoxylanase in complex with an oligosaccharide, the nucleophile mutant E279S was used and mixed with a range of arabinoxylooligosaccharides that was generated by digestion of WAX with CtGH5-CBM6 (see above) and thereafter by 100 nM of the Cellvibrio japonicus GH43 exo-1,4-␤-xylosidase (47). Only the inclusion of the largest purified oligosaccharide generated crystals of the arabinoxylanase. Crystals of CtGH5 E279S -CBM6 were obtained by mixing an equal volume (100 nl) of the protein (11 mg/ml)/oligosaccharide (10 mM) solution and mother liquor solution consisting of 100 mM Tris-Bicine, pH 8.5, 12.5% (w/v) polyethylene glycol with an average molecular mass of 1,000 Da, 12.5% (w/v) polyethylene glycol with an average molecular mass of 3,350 Da and 12.5% (R,S)-2-methyl-2,4pentanediol (racemic). Crystallographic data were collected on Beamlines IO2, IO4-1, and I24 at the DIAMOND Light Source (Harwell, UK). The data were processed using XDS (48) The crystal structures were solved by molecular replacement using MolRep (49) with CtGH5-CtCBM6 (PDB code 5AK1) as the search model. The refinement was done in RefMac5 (27), and COOT (26) was used for model (re)building. The final model were validated using Molprobity (32). The data collection and refinement statistics are listed in Table 2.