Structure and Function of an Arabinoxylan-specific Xylanase*

The enzymatic degradation of plant cell walls plays a central role in the carbon cycle and is of increasing environmental and industrial significance. The enzymes that catalyze this process include xylanases that degrade xylan, a β-1,4-xylose polymer that is decorated with various sugars. Although xylanases efficiently hydrolyze unsubstituted xylans, these enzymes are unable to access highly decorated forms of the polysaccharide, such as arabinoxylans that contain arabinofuranose decorations. Here, we show that a Clostridium thermocellum enzyme, designated CtXyl5A, hydrolyzes arabinoxylans but does not attack unsubstituted xylans. Analysis of the reaction products generated by CtXyl5A showed that all the oligosaccharides contain an O3 arabinose linked to the reducing end xylose. The crystal structure of the catalytic module (CtGH5) of CtXyl5A, appended to a family 6 noncatalytic carbohydrate-binding module (CtCBM6), showed that CtGH5 displays a canonical (α/β)8-barrel fold with the substrate binding cleft running along the surface of the protein. The catalytic apparatus is housed in the center of the cleft. Adjacent to the −1 subsite is a pocket that could accommodate an l-arabinofuranose-linked α-1,3 to the active site xylose, which is likely to function as a key specificity determinant. CtCBM6, which adopts a β-sandwich fold, recognizes the termini of xylo- and gluco-configured oligosaccharides, consistent with the pocket topology displayed by the ligand-binding site. In contrast to typical modular glycoside hydrolases, there is an extensive hydrophobic interface between CtGH5 and CtCBM6, and thus the two modules cannot function as independent entities.

The plant cell wall, which is an important biological and industrial resource, primarily consists of interlocking polysac-charides (for review see Ref. 1). The biological conversion of the polysaccharides within the plant cell wall to their constituent monosaccharides is central to its biological and industrial exploitation (2,3). An example of this chemical complexity is provided by xylan, which is the major hemicellulosic component of the wall. This polysaccharide includes a backbone of ␤-1,4-xylose residues in their pyranose configuration (Xylp), which are decorated at O2 with 4-O-methyl-D-glucuronic acid and at O2 and/or O3 with arabinofuranose (Araf) residues, whereas the polysaccharide can also be extensively acetylated. In addition, the Araf side chain decorations can also be esterified to ferulic acid that, in some species, provides a chemical link between hemicellulose and lignin ( Fig. 1) (1). The precise structure of xylans varies between plant species, tissues, and during cellular differentiation (4).
Reflecting the chemical complexity of plant structural polysaccharides, microbial plant cell wall-degrading microorganisms express a large number of enzymes, often in excess of 100 biocatalysts, that target specific linkages within these carbohydrate polymers (5)(6)(7)(8). The majority of plant cell wall-degrading enzymes are glycoside hydrolases, although polysaccharide lyases and carbohydrate esterases also contribute to the catabolic process. These enzymes are grouped into families based on sequence and structural and catalytic conservation within the CAZy data base (9). As discussed in the accompanying article (48), many of these enzymes are appended to noncatalytic carbohydrate-binding modules (CBMs) 4 that are also grouped into families on the CAZy data base. The xylan backbone is hydrolyzed by xylanases, the majority of which are located in glycoside hydrolase families 10 and 11, although they are also present in GH8 and GH30 (10,11). The extensive decoration of the xylan backbone generally restricts the capacity of these enzymes to attack the polysaccharide prior to removal of the side chains (12).
Here, we report the biochemical properties and crystal structure of a GH5 enzyme that is appended to a family 6 CBM (CtCBM6). The enzyme (defined as CtXyl5A) is an arabinoxylan-specific xylanase that utilizes Araf decorations, appended to O3 of the Xylp bound at the active site, as an essential specificity determinant. The capacity of CtXyl5A to also accommodate arabinose side chains in all the other subsites (in addition to the active site) within the substrate binding cleft enables the enzyme to hydrolyze highly decorated arabinoxylans. The functional significance of the specificity of the arabinoxylanase, in the context of the plant cell wall degrading apparatus of the host bacterium, is discussed.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification of Components of CtXyl5A-DNA encoding CtGH5, CtGH5-CBM6, and CtCBM6 were amplified using primers, containing NheI and XhoI restriction sites, which are listed in supplemental Table  S1. The amplified DNAs were cloned into NheI/XhoI-restricted pET21a such that the encoded recombinant proteins contained a C-terminal His 6 tag. To express the Clostridium thermocellum proteins, Escherichia coli strain BL21(DE3), harboring appropriate recombinant plasmids, was cultured to mid-exponential phase in Luria broth at 37°C, followed by the addition of isopropyl ␤-D-galactopyranoside at 1 mM to induce recombinant gene expression, and incubated for a further 5 h at 37°C. The recombinant proteins were purified to Ͼ90% electrophoretic purity by immobilized metal ion affinity chromatography using Talon TM (Clontech), cobalt-based matrix, and elution with 100 mM imidazole, as described previously (13). When preparing the selenomethionine derivative of CtGH5-CBM6 for crystallography, the proteins were expressed in E. coli B834 (DE3), a methionine auxotroph, cultured in media comprising 1 liter of SelenoMet Medium Base TM , 50 ml of Sel-enoMet Nutrient Mix TM (Molecular Dimensions), and 4 ml of a 10 mg/ml solution of L-selenomethionine. Recombinant gene expression and protein purification were as described above except that all purification buffers were supplemented with 10 mM ␤-mercaptoethanol.
Mutagenesis-Site-directed mutagenesis was carried out using the PCR-based QuikChange method (Stratagene) deploying the primers listed in the supplemental Table S1.
Enzyme Assays-CtXyl5A and its derivatives were assayed for enzyme activity using the method of Miller (14) to detect the release of reducing sugar. The standard assay was carried out in 50 mM sodium phosphate buffer, pH 7.0, and the potential polysaccharide substrate was at 1 mg/ml. The reactions were initiated by the addition of enzyme up to 10 M and incubated at 60°C (unless otherwise stated) for up to 16 h. The identification of potential reaction products was also assessed by HPAEC using methodology described previously (15). The capacity of CtGH5 and CtGH5-CBM6 to hydrolyze xylooligosaccharides was assessed by HPAEC using 100 M of oligosaccharide and 5 M of protein.
Oligosaccharide Analysis-Rye arabinoxylan (5 g) was digested to completion (no further increase in reducing sugar and change in the HAEPC product profile) with 3 M of CtXyl5A at 60°C for 48 h. The oligosaccharide products were partially purified by size exclusion chromatography using a Bio-Gel P2 column as described previously (16). The structures of the oligosaccharides were analyzed by NMR, electrospray ionization mass spectrometry (ESI-MS), and HPAEC in combination with selective enzyme treatment. Partially methylated alditol acetate derivatives of the glycosyl residues of the oligosaccharides were prepared and analyzed by gas chromatography-electron impact mass spectrometry GC-EI-MS.
Preparation of the Partially Methylated Alditol Acetates-The mixture of oligosaccharides (ϳ500 g) was per-O-methylated using the method of Ciucanu and Kerek (17). The per-Omethylated oligosaccharides were hydrolyzed with 2 N TFA, reduced, and acetylated to generate partially methylated alditol acetate derivatives (18).
Preparation of Per-O-methylated Oligoglycosyl Alditols-The sample (ϳ500 g) was reduced with sodium borohydride to generate oligoglycosyl alditols, which were per-O-methylated as described previously (19).
MALDI-TOF Mass Spectrometry (MALDI-TOF-MS)-Positive ion MALDI-TOF mass spectra were recorded using an Applied Biosystems Voyager-DE biospectrometry workstation. Samples (1 l of a mg/ml solution) were mixed with an equal volume of matrix solution (0.1 M 2,5-dihydroxybenzoic acid and 0.03 M 1-hydroxyisoquinoline in aqueous 50% MeCN) and dried on MALDI target plate. Typically, spectra from 200 laser shots were summed to generate a mass spectrum.
ESI-MS-The multiple stage ESI mass spectra were recorded in a Thermo Scientific LTQ XL ion trap mass spectrometer. Per-O-methylated oligoglycosyl alditols in methanol were diluted with 50% acetonitrile/water containing 0.1% TFA. Samples were infused through a fused silica capillary (150 m inner diameter ϫ 363 m outer diameter ϫ ϳ60 cm, Thermo Finnigan) into the source at flow rate of 3 l/min using the syringe pump provided with the instrument. The electrospray source was operated at a voltage of 5.0 kV, and the capillary heater was set to 275°C. All the experiments were performed in the positive-ion mode.
NMR Spectroscopy-Oligosaccharides (ϳ2 mg) were dissolved in D 2 O (0.5 ml, 99.9%; Cambridge Isotope Laboratories). 1 H NMR spectra were recorded with Varian Inova NMR spec- trometer operating at 500 MHz at 298 K. All two-dimensional spectra were recorded using standard Varian pulse programs.
Isothermal Titration Calorimetry-The binding of CtCBM6 to ligands was quantified by isothermal titration calorimetry (ITC), as described previously (20). Titrations were carried out in 50 mM Na-HEPES buffer, pH 7.5, containing 5 mM CaCl 2 at 25°C. The reaction cell contained protein at 145 M, and the syringe contained the monosaccharide or oligosaccharide at 5-15 mM, and polysaccharide, when used as the titrant, was at 3-5 mg/ml. The titrations were analyzed using Microcal Origin version 7.0 software to derive, n, K a , and ⌬H values, and ⌬S was calculated using the standard thermodynamic equation, RTlnK a ϭ ⌬G ϭ ⌬H Ϫ T⌬S.
Crystallography-Proteins were crystallized using the hanging drop vapor technique at 20°C with an equal volume (1 l) of protein and reservoir solution. Native (10 mg/ml) and selenomethionine (3 mg/ml) CtGH5-CBM6 were crystallized in 16 -24% PEG 3000, 150 mM sodium citrate, pH 5.5. A CtGH5-CBM6 construct containing two additional methionines, W391M/W397M, was produced to facilitate structure solution by selenomethionine single wavelength anomalous x-ray scattering. Crystals were cryoprotected by the inclusion of 25% glycerol in the crystallization solution and flash-frozen in liquid nitrogen. Diffraction data were collected at ID14.4 ESRF, Grenoble, France, at the selenium K absorption edge to enable structure solution by single wavelength anomalous x-ray scattering. The diffraction data were processed in MOSFLM (21) and SCALA (22), and the heavy atom substructure was solved using SHELXCDE (23) as part of CCP4i, and an initial model was built in Arp/wArp (24), which was completed manually in COOT (25). The complete initial model was used to determine the structure of the wild type protein by molecular replacement and refined at higher resolution from data collected at the Diamond Light Source, UK. The crystal of the reported structure had been soaked in 20 mM "Fraction 1" in an attempt to obtain a structure of the enzyme in complex with carbohydrate, although no sugar molecules, other than glycerol, were observed in the electron density.
All structures were refined to convergence using REFMAC5 (26) with manual corrections being applied in COOT (25). The data collection, phasing, and refinement statistics are displayed in supplemental Table S2, and the PDB code for the protein structure is 2y8k.

RESULTS
Expression and Purification of CtXyl5A-To investigate the function of the CtGH5 and CtCBM6 components of CtXyl5A, the modules were expressed as either individual entities or covalently linked. Although CtCBM6 and CtGH5-CBM6 were expressed in soluble form at high levels in E. coli, CtGH5 was predominantly insoluble, and only a small amount of soluble protein was generated in the enteric bacterium. All three proteins were purified by immobilized metal ion affinity chromatography to electrophoretic homogeneity.
CtXyl5A Is an Arabinoxylanase-Screening the capacity of CtXyl5A to hydrolyze plant structural polysaccharides revealed that the enzyme was able to degrade rye and wheat arabinoxylan, displayed limited activity against oat spelt xylan but was unable to act on glucuronoxylan, birch, or beech xylan ( Table  1). The enzyme displayed no activity against a range of mannans, pectins, galactans, arabinans, and ␤-glucans (data not shown). The individual kinetic constants of CtXyl5A against rye and wheat arabinoxylan could not be determined as the K m value was greater than the maximum concentration of soluble substrate; however, the catalytic efficiency of the enzyme was similar for both rye and wheat arabinoxylan. The high K m value may reflect weak affinity for the substrate, or the glycosidic bonds targeted by CtXyl5A occur rarely in the arabinoxylan substrates. The enzyme displayed trace activity against xylohexaose with a k cat /K m value estimated to be Ͻ10 1 min Ϫ1 M Ϫ1 . These data indicate that CtXyl5A hydrolyzes arabinoxylans but does not act on xylans that contain few arabinofuranose side chains. This is in sharp contrast to typical xylanases, located mainly in GH10 and GH11, which display a preference for the poorly decorated xylans from birch and beech (12). These data show that CtXyl5A displays specificity for arabinoxylans and as such is defined as an arabinoxylanase, an activity not previously reported.
Characterization of the Reaction Products Generated by CtXyl5A from Arabinoxylan-To explore the substrate specificity of CtXyl5A in more detail, the reaction products generated by treating rye arabinoxylan with the enzyme were partially purified by size exclusion chromatography to remove high molecular weight polymers. The fractions containing the majority of the products were pooled (designated henceforth as fraction 1). Fraction 1 was reduced and per-O-methylated, and the products were analyzed by MALDI-TOF-MS. The data revealed that the major reaction products were pentose-containing oligosaccharides with degrees of polymerization (DPs) of 3 (m/z 565), 4 (m/z 725), and 5 (m/z 885), respectively ( Fig.  2A). Partially methylated alditol acetate derivatives were then prepared from per-O-methylated fraction 1 and analyzed by GC-EI-MS (Fig. 2B). This semi-quantitative analysis revealed terminal Araf (methylated at O2, O3, and O5), terminal Xylp (methylated at O2, O3, and O4), 3-linked Xylp, 4-linked Xylp, and 3,4-linked Xylp. No Xylp residues decorated at O2 or at

Catalytic activity of CtXyl5A and its variants
The enzymes were assayed at 60°C in 50 mM sodium phosphate buffer, pH 7.0, containing substrate at a concentration of 1 mg ml Ϫ1 . The reaction was monitored by the release of reducing sugar (14). The catalytic rate could be used to determine k cat /K m values as the substrate concentration was Ͻ Ͻ K m (the rate of reaction was directly proportion to substrate concentration up to 2 mg ml Ϫ1 ). Note that CtXyl5A is the full-length enzyme, and CtGH5-CBM6 and CtGH5 are derivatives of the enzymes containing the catalytic module appended to the CBM6 and the catalytic module, respectively.

Proteins
both O2 and O3 were observed. These data indicate that the oligosaccharides consist of a backbone of (134)-linked Xylp residues decorated with Araf side chains at O3 of internal or reducing Xylp residues (3,4-linked Xylp), or at O3 of nonreducing terminal Xylp residues (3-linked Xylp). Fraction 1 was also treated with CjAbf51A, an arabinofuranosidase that releases Araf residues from O2 or O3 of singly branched Xylp residues in the xylan backbone (27). HPAEC analysis of the CjAbf51A digestion products revealed the presence of arabinose, xylobiose, xylotriose, and xylotetraose (Fig. 2C), indicating that the predominant CtXyl5A products are xylooligosaccharides in which at least one of the Xylp residues bear a mono-Araf side chain. By contrast, GH10 and GH11 xylanases generate predominantly xylose and xylobiose from wheat arabinoxylan, reflecting a preference for undecorated regions of the polysaccharide (12). The oligosaccharides in fraction 1 were analyzed by several two-dimensional NMR methods, including gCOSY, HSQC, TOCSY, and ROESY. These analyses provided scalar and dipolar correlations that allowed the resonances of the most abundant spin systems to be assigned to specific sugar residues (supplemental Table S3; for a more detailed description of this approach, see, for example, Refs. 19, 28, 29). Upfield shifts typ-ical of reducing residues (19,28,29) were observed for two C1 resonances (␦ 92.4 and 96.6) in the HSQC spectrum of the CtXyl5A-generated oligosaccharides (Fig. 3A). In combination with other two-dimensional NMR data, this allowed these two resonances to be assigned to ␣-Xylp and ␤-Xylp residues at the reducing end of the oligosaccharides. However, the exact 1 H and 13 C shifts of these reducing residues indicate that they are structurally distinct from the unbranched (4-linked) sugars at the reducing end of oligosaccharides, generated by more typical endoxylanases (19,28,29). The data reveal the presence of an Araf side chain at O3 (along with a ␤-Xylp at O4) of the reducing Xylp residues of the CtXyl5A-generated oligosaccharides. For example, the C3 resonances of the reducing ␣-Xylp and ␤-Xylp units exhibit diagnostic downfield glycosylation shifts (␦ C 77.7 and 77.8), relative to the corresponding unbranched reducing residues produced by more typical endoxylanases (␦ C 71.2 and 73.8). Furthermore, the ROESY spectrum of fraction 1 (Fig. 3B) revealed strong dipolar interactions between the two most intense ␣-Araf H1 resonances (␦ H 5.342 and 5.391) and the reducing ␣-Xylp and ␤-Xylp H3 resonances (␦ H 3.906 and 3.736, respectively), indicating that most of the ␣-Araf residues are linked to O3 of reducing Xylp moieties. The identification of branched, reducing Xylp residues in fraction 1 is consistent with the detection of 3,4-linked Xylp residues in the partially methylated derivatives (Fig. 2B). Resonances corresponding to unbranched 4-linked ␤-Xylp residues at the reducing end of the oligosaccharides (e.g. H1 at ␦ 4.584, see Fig. 3A) were not detectable in the NMR spectra. Integration of the Xylp and Araf H1 resonances in the one-dimensional spectrum of the CtXyl5Agenerated oligosaccharides (Fig. 3A) allowed the following quantitative conclusions to be drawn; the oligosaccharides have an average backbone DP of 2.76 and an average overall DP of 4.04; Ͼ99% of the oligosaccharides have an ␣-l-Araf side chain on O3 of the reducing Xylp residue; ϳ30% of the oligosaccharides have a second ␣-l-Araf side chain.
To analyze fraction 1 by ESI-MS n , the oligosaccharides in this sample were treated with NaBH 4 , and the resulting oligoglycosyl alditols were methylated prior to fragmentation. This procedure imparts a distinctive mass label to the newly formed alditol end of the oligosaccharide, facilitating ESI-MS n analysis (19). The data, examples of which are shown in Fig. 4, provided unambiguous evidence supporting the presence of branched reducing residues in the oligosaccharides in fraction 1. This conclusion is exemplified by the analysis of the possible tetrasaccharides in fraction 1. Thus, based on the structure of the polysaccharide substrate, linkage, and NMR analysis of fraction 1, only five different tetrasaccharide structures (Ia, Ib, IIa, IIb, and III) are theoretically possible (Fig. 5). The ESI-MS n analysis provided information regarding the topology of the oligomers but did not define the stereochemistry (identity) of the individual pentose residues. Therefore, the terminal pentose residues at the nonreducing end of the main chain in structures Ia, Ib, IIa, IIb, displayed in Fig. 5, are indicated by the letter P (as the sugar can be either Araf or Xylp residues). However, in Fig. 5, nonterminal backbone residues, and sugars attached to branched backbone units (backbone sugars that are linked at O4 and O3 to other sugars), are known to be Xylp and Araf, respectively. Thus, structure I could be (Araf)-Xylp-Xylp-Xylol (Ia) or Xylp-Xylp-Xylp-Xylol (Ib) in which Araf is an arabinose decoration appended to the following xylose residue, whereas Xylol is the alditol form of the xylose at the reducing end. Structure II could be Araf-Xylp-(Araf)-Xylol (IIa) or Xylp-Xylp-(Araf)-Xylol (IIb), and III is Xylp-(Araf)-Xylp-Xylol. The quasimolecular (M ϩ Na ϩ ) ion at m/z 725, corresponding to these DP4 structures was selected for MS 2 (Fig. 4A). The fragmentation pattern is dominated by y ions (19,30), which contain the alditol end of the oligomer. The y ion (m/z 551) generated by loss of a single terminal pentosyl residue was selected as the precursor for MS 3 fragmentation (Fig. 4B). Comparison of this MS 3 spectrum (Fig. 4, A and B) to the theoretical fragmentation pattern for all possible m/z 551 ions (Fig. 5) indicates that structures I and III are not present, as these would fragment to form ions at m/z 231, which were not observed. This was confirmed by MS 4 analysis (Fig. 4, C and D), in which MS 3 fragment ions at m/z 391 and 377 were selected as precursors. Here, the extremely low abundance of ions at m/z 231 confirms the absence of significant amounts of structures I and III (Fig. 5). However, all ions predicted for structure II were observed, notably the high abundance ion at m/z 217, which consists of the alditol residue with two unmethylated hydroxyl groups that were exposed by cleavage of glycosidic bonds during MS 2 and MS 3 (Fig. 4).
When the DP5 oligoglycosyl alditols in fraction 1 were analyzed by MS n , virtually all of the alditol moieties were branched (supplemental Figs. S1 and S2). ESI-MS n data for the DP5 oligoglycosyl alditols also provide further insight into the extent to which Araf side chains can decorate the xylooligosaccharides produced by CtXyl5A. Notably, MS 4 of the m/z 537 ion (derived from the alditol pentasaccharide) generates an m/z 363 y ion that yields an m/z 217 ion at MS 5 . As shown in the schematic of supplemental Fig. S2, these species can only be generated if the xylosyl alditol and the adjacent Xylp are both branched. The detection of a m/z 377 ion at MS 3 , however, demonstrates that the structure Xylp-Xylp-(Araf)-Xylol is also present. Fragmentation of DP3 oligoglycosyl alditols yields an m/z 217 y ion at  Figs. S3 and S4). This again demonstrates that the xylosyl alditol contains a branch, and thus the structure of the trisaccharide is predicted to be Xylp-(Araf)-Xylol.
Binding of CtXyl5A to Arabinoxylan-The terminal reaction products produced by endo-acting glycoside hydrolases reflects an iterative process in which the products from initial hydrolytic reactions serve as substrates in subsequent rounds of catalysis. Analysis of the structure of the terminal reaction products (which are unable to be further hydrolyzed) provides insight into the possible modes of substrate binding to both the negative and positive subsites (see below). The subsite nomenclature of glycoside hydrolases was defined previously by Davies et al. (31). Briefly, the scissile bond is positioned between subsites Ϫ1 and ϩ1, and subsites that extend toward the nonreducing and reducing ends of the substrate are assigned increasing negative and positive numbers, respectively. The Xylp residues at the reducing and the nonreducing end of the oligosaccharide products are derived from substrate bound at the Ϫ1 and ϩ1 subsites, respectively. As ϳ99% of the reducing end Xylp residues contain an O3 Araf branch, it is evident that the arabinose decoration of the xylose bound at the Ϫ1 subsite is a key specificity determinant of the enzyme. The detection of terminal Xylp (in which O2, O3, and O4 are methylated) and 3-linked Xylp residues, both of which occur at the nonreducing end of the oligosaccharide backbone, indicates that a Xylp with an Araf side chain at O3 can be accommodated in the ϩ1 subsite of CtXyl5A, but a side chain in this position is not a specificity determinant. As both (Araf)-Xylp-(Araf)-Xylol and Xylp-Xylp-(Araf)-Xylol were identified in the tetrasaccharide, an O3-Araf side chain is present on some, but not all, of the Xylp residues bound in the Ϫ2 subsite. Thus, although an O3-Araf side chain can be accommodated at the Ϫ2 subsite, the arabinose decoration does not define enzyme specificity. The identification of Xylp-(Araf)-Xylp-(Araf)-Xylol in the pentasaccharide reaction products not only confirms that Araf can be present at the Ϫ1 and Ϫ2 subsites, but it also demonstrates that the ϩ2 and ϩ3 (if it exists) subsites can accommodate Xylp residues bearing arabinose side chains. It should be noted, however, that Xylp-(Araf)-Xylp-(Araf)-Xylp is a potential substrate for the enzyme (binding from subsites Ϫ2 to ϩ1), suggesting that this molecule is only hydrolyzed very slowly by the enzyme, possibly because it is unable to access the ϩ2 subsites. This is consistent with the absence of Xylp or (Araf)-Xylp in the reaction products; xylose or decorated xylose can only be generated if the substrate is hydrolyzed when it occupies only ϩ1 of the positive subsites of the enzyme. Thus, to summarize, subsites Ϫ2 to ϩ2 of CtXyl5A can accommodate Xylp residues that contain an O3-Araf side chain; however, only at the Ϫ1 subsite does the arabinose decoration act as an essential specificity determinant.
CtCBM6 Specificity-To investigate whether CtCBM6 is a functional CBM, the capacity of CtGH5-CBM6 to bind to various carbohydrates was assessed by ITC. The data showed that CtGH5-CBM6 bound to cellohexaose and cellobiose with similar affinity ( Table 2; example titrations are shown in Fig. 6). By contrast, binding to glucose was too low to quantify. The protein also displayed affinity for the reaction products generated by CtXyl5A and for undecorated xylooligosaccharides. The protein did not appear to bind to various xylans or to ␤-1,3-␤-1,4-glucans. This indicates that CtGH5-CBM6 recognizes the terminal region of these polysaccharides, as the concentration of ligand available to the protein in these polymers, which have DPs Ͼ300, would be very low, and thus binding would not be detected. It is possible that the catalytic module, rather than CBM6, mediates binding to the xylo-and cello-oligosaccharides. To test this hypothesis, the ligand binding profile of variants of CtGH5-CBM6, in which either Trp-424 or Phe-478 had been substituted with Ala, was assessed. As discussed below, these two aromatic residues are highly conserved in the CBM6 family and includes the primary binding site in this protein family (32). Both CtGH5-CBM6:W424A and CtGH5-CBM6:F478A, although catalytically active (Table  1), displayed no binding to the xylan-and cellulose-derived oligosaccharides ( Table 2). It is therefore evident that the CBM6 component of CtGH5-CBM6 mediates the observed binding to oligosaccharides.
Crystal Structure of CtGH5-CBM6-The structure of CtGH5-CBM6 was solved by selenomethionine single wavelength anomalous x-ray scattering, and the resulting structure was used as a starting model for refinement against native data extending to 1.5 Å resolution (supplemental Table S2) (PDB code 2y8k). The polypeptide chain is visible from Ser-37 to Ile-516.
CtGH5-As expected, the N-terminal CtGH5 module displays a (␤/␣) 8 barrel architecture, although ␣-helix 8 points away from the barrel and toward CtCBM6 module (discussed below) (Fig. 7). GH5 enzymes are members of clan GH-A in which the two catalytic residues are invariant glutamates pre-  Fig. 4, the structures of the tetrasaccharides in fraction 1 were identified. The sugars labeled P can be Araf or Xylp. The data showed that the oligosaccharide ions colored green were present, and those colored red were not evident. The solid arrows between oligosaccharides show the conversion of one oligosaccharide into another, through ESI-MS fragmentation. Dotted arrows between oligosaccharides identify theoretical ESI-MS-mediated oligosaccharide conversions that did not occur in these analyses. The dotted arrow between sugar linkages within the oligosaccharides shows the fragmentation site and the ion identified. Arrows pointing at sugars (but do not link two sugars together) identify hydroxyl groups that were not methylated as they composed a glycosidic linkage in a parental ion. Xylol is the reducing end xylose that has been reduced to its alditol form by NaBH 4 .
sented at the end of ␤-strands 4 and 7 (33,34). From the structure of CtGH5-CBM6, the catalytic acid-base is likely to be Glu-171 (end of ␤-strand 4) and the catalytic nucleophile Glu-279 (end of ␤-strand 7). The catalytic role of these two residues is confirmed by the observation that the mutants E171A and E279A are inactive (Table 1). A narrow V-shaped cleft, ϳ25 Å in length, extends along the full length of the protein and sits over the top of the ␤-barrel. The dimensions of the cleft, in the center of which is the catalytic apparatus, suggest that the protein contains ϳ5 subsites extending from Ϫ3 to ϩ2.
An analysis of structural homologues of the CtGH5 component of CtGH5-CBM6 by the DaliLite webserver identified a large number of GH5 and clan GH-A enzymes that displayed significant structural similarity to CtGH5. The Pseudoalteromonas haloplanktis cellulase Cel5G (PDB 1tvn), with a root mean square deviation of 2.8 Å over 253 C␣ atoms and a Z-score of 24.1, and the Bacillus agaradhaerens cellulase BaCel5A (PDB 1qi2), with a root mean square deviation of 2.9 Å over 254 C␣ atoms and a Z-score of 23.6, are representative and close structural homologues. The critical Ϫ1 subsite, where the transition state is formed, is similar in the arabinoxylanase and the GH5 cellulases. In addition to the two catalytic glutamates, CtGH5 contains several key residues that have been identified as "strictly conserved" in family GH5 enzymes (35). These residues in the CtGH5 module, which superimpose with amino acids in the active site of BaCel5A (the cellulase residues are shown in parentheses), are as follows: Asn-170 (Asn-138), Glu-171 (Glu-139), Tyr-255 (Tyr-202), Glu-279 (Glu-228), and Phe-310 (Trp-262) (Fig. 8A). The catalytic acid base, Glu-171, makes hydrogen bonds with Asn-139 and His-253, and these interac-

Binding of CtXyl5A derivatives to polysaccharides and oligosaccharides
The binding of derivatives of CtXyl5A to ligands was measured by ITC. The protein was at the 145 M in the cell and polysaccharide (3-5 mg/ml) or oligosaccharide (5-15 mM) was in the syringe. ITC was carried out in 50 mM Na/HEPES buffer, pH 7.5, at 25°C. The concentration of the oligosaccharides generated by the digestion of wheat arabinoxylan (WAX) was fitted to give an n value close to 1.  Despite numerous attempts, no structure of CtGH5-CBM6 in complex with its substrate or reaction products has been obtained, in part due to the preference of this protein to crystallize with the N-terminal residues of a symmetry-related molecule positioned in the substrate binding cleft, and because co-crystallization experiments did not yield diffracting crystals. Consequently, it is difficult to define precisely the structural basis for the unusual substrate specificity displayed by the arabinoxylanase. Superimposing BaCel5A in complex with 2-deoxy-2-fluorocellotriose with CtGH5 provides some insight into the specificity displayed by the arabinoxylanase. As discussed above, the catalytic apparatus, the residues that interact with O2 and the endocyclic oxygen of the Ϫ1 sugar, and the hydrophobic platform are conserved in CtGH5 (Fig. 8A). It is evident, however, that the arabinoxylanase lacks the residues that in other GH5 enzymes hydrogen bond with O3 of the active site sugar. For instance, His-101 and Tyr-66 in BaCel5A hydrogen bond with O3 of the Ϫ1 Glc, whereas the equivalent residues in CtGH5 are Gly-134 and Cys-95, respectively (Fig. 8A). Indeed, in the Ϫ1 subsite of the arabinoxylanase, there is a large pocket around the O3 of the superimposed Glc that could accommodate a sugar decoration such as Araf (Fig. 8B). The pocket contains a tyrosine (Tyr-92) that may make hydrophobic interactions with the arabinose and contains several polar residues, Glu-68, Asn-135, Asn-139, and Asn-170, that could make polar contacts with the sugar. Based on the presence of glycerol and water molecules within this region of the enzyme, an Araf molecule was modeled into the pocket (Fig. 8C).
CtCBM6-The structure of the CtCBM6 module displays a ␤-sandwich fold typical of other family CBM6 members (Fig. 7) (32,37,38). The twisted pair of ␤-sheets, which can be viewed as forming an extended barrel, consist of five and four anti-parallel ␤-strands, respectively. The structure of CtCBM6 shows strong similarity with numerous CBM6 members. The closest homologue is the CBM6 module (designated CmCBM6) from the Cellvibrio mixtus lichenase CmLic5A (PDB 1uz0; root mean square deviation 1.5 Å over 123 C␣ atoms and a Z-score of 18.1). The major binding site in the CBM6 family is in the loops connecting the two ␤-sheets. This region, referred to as site A (32,37), may comprise a pocket if terminal sugars are recognized (39) or a cleft for the binding of internal regions of polysaccharides (32). A central feature of site A is a pair of aromatic residues, which bind to the ␣and ␤-face, respectively, of the terminal sugar (or central sugar in the case of xylan-binding modules) and an asparagine, located at the base of the site that makes critical hydrogen bonds with O2, O3, or O4. Specificity is conferred by additional polar and hydrophobic interactions (37). Site A in CtCBM6 displays a pocket-like topology and contains all the key ligand binding residues present in CmCBM6 (Fig. 9) (40). The pair of aromatic residues in CmCBM6, Trp-92 and Tyr-33, which straddle the nonreducing terminal sugar, correspond to Phe-478 and Trp-424, respectively, in CtCBM6. Furthermore, Glu-20 and Asn-121 in CmCBM6, which make polar contacts with O3 and O4 of the nonreducing terminal sugar in cello-and xylooligosaccharides, superimpose with Glu-411 and Asn-507, respectively, in CtCBM6. Finally, the amide nitrogen of Tyr-33 in CmCBM6 makes a polar contact with O2 and O3 of the terminal sugar, a contact that is likely to be replicated by that of Trp-424 in CtCBM6. The structural conservation between site A in CtCBM6 and CmCBM6 is consistent with the similar ligand specificities displayed by this binding site in the two proteins ( Table 2) (39). Thus, both proteins bind to xylo-and gluco-  A shows the superimposition of the residues in the active site (Ϫ1 subsite) of BaCel5A (PDB 1qi2; colored green), which interact with the substrate, with the equivalent amino acids (colored yellow) in CtGH5. B shows the solvent-accessible surface of CtGH5 in which 2-deoxy-2-fluorocellotriose, derived from BaCel5A, has been superimposed. C depicts a model of xylotriose, containing Araf appended to O3 of Xylp-1, bound to CtGH5. The tetrasaccharide ligand is modeled on the superimposed structure of 2-deoxy-2-fluorocellotriose and the glycerol and water molecules in the putative arabinose binding pocket. A and B, bound ligand is colored silver (carbons), and the Xylp and Araf residues in C are colored salmon pink and blue (carbons), respectively. configured oligosaccharides but do not display affinity for the corresponding polysaccharides. Thus, the structural similarity between CmCBM6 and CtCBM6 is consistent with the view that the Clostridium module targets the terminal regions of oligosaccharides. In CmCBM6 cellooligosaccharides can bind to site A in both orientations, consistent with the targeting of O1/O4, O2, and O3, but not the endocyclic oxygen or O6, which would adopt different positions in the two orientations. Therefore, it is highly likely that CtCBM6 will also bind to xyloand cellooligosaccharides in both orientations. Given that the key interactions with the ligand at site A is with the terminal sugar, it is perhaps surprising that CtCBM6 does not display measurable binding to xylose or glucose. It is possible that the entropic cost of locking the sugar into a pyranose ring conformation may contribute to the weak binding, although it is also possible that the protein makes indirect, water-mediated interactions to the penultimate sugar in the oligosaccharides, as observed in CmCBM6-ligand complexes (40).
Linker Connecting CtGH5 with CtCBM6-CtCBM6 is connected to CtGH5 by a sequence extending from residues Gly-336 to Thr-373. This linker, which adopts a stable conformation based on its B-factor, makes numerous internal polar contacts and forms hydrogen bonds with ␤-strand 3 and the loop connecting ␤-strands 3 and 4 of CtCBM6 and ␣-helices 7 and 8 of CtGH5. Furthermore, the C-terminal region of ␣-helix 8 and the internal region of ␣-helix 7 make hydrogen bonds with ␤-strands 3 and 7 of CtCBM6. The polar contacts between the two modules are augmented by a large number of apolar interactions mediated by the linker sequence. The resultant burial of a significant hydrophobic surface, at the interface between CtGH5 and CtCBM6, likely explains why these two modules (or domains) do not fold independently, as occurs in other glycoside hydrolases that contain catalytic modules and CBMs (41). This view is consistent with the observation that CtCBM6, when expressed as a discrete entity (Thr-373 to Ile-516), does not bind to cellohexaose or xylohexaose, and CtGH5 (Asn-32 to Thr-373) exhibits very low catalytic activity and is considerably more thermolabile than CtGH5-CBM6 (supplemental Fig. S5).

DISCUSSION
This study reveals a C. thermocellum protein that displays arabinoxylanase activity, an activity not previously reported. The vast majority of xylanases are derived from GH10 and GH11 and target the ␤-1,4-D-xylose polymeric backbone. These enzymes do not generally distinguish between different xylans, although highly decorated forms of the polysaccharide, such as rye arabinoxylan, are poorly degraded as steric constraints restrict enzyme access (12). Indeed, the only other examples of xylanases that utilize side chains as essential specificity determinants are glucuronoxylan-specific enzymes from GH30. These enzymes make critical interactions with the 4-Omethylglucuronic acid (linked ␣-1,2 to the xylan backbone) that decorates the xylose at the Ϫ2 subsite (42). CtXyl5A is highly unusual in that its essential Araf decoration is attached to the xylose positioned in the active site. The only other example of an active site side chain specificity determinant is the ␣-1,6-Xylp that decorates the Ϫ1 Glc in the xyloglucan cellobiohydrolase, OXG-RCBH, from Geotrichum sp. (43).
The function of CtXyl5A within the context of C. thermocellum, which has the genetic capacity to recruit 72 different enzymes into the cellulosomes (44), including seven GH10 and GH11 xylanases, is intriguing. It is likely that the GH10 and GH11 enzymes target xylans that are sparsely decorated with arabinose side chains. By contrast, CtXyl5A most likely hydrolyzes xylans where tandem Xylp residues contain Araf decorations. The recognition of the termini of xylo-and gluco-configured polymers by CtCBM6 suggests that the arabinoxylanase is targeted to regions of the plant cell wall that is undergoing degradation and is therefore accessible to enzyme attack. Although the primary function of CBMs is to bring their cognate enzymes into close contact with appropriate substrates (45), there is increasing evidence that a subset of these modules, from CBM families 6, 9, and 35, target the termini of polysaccharides and thus may play a similar function to CtCBM6 (37,46,47). In conclusion, CtXyl5A displays a specificity that is complementary to endoxylanases from GH10, GH11, and GH30. As such the enzyme will make a contribution to the toolbox of biocatalysts required to degrade plant cell walls to their constituent sugars, which can then be used in the biofuel and bioprocessing industries. show the solvent-accessible surface of CtCBM6 in complex with cellotriose (superimposed from CmCBM6). Amino acids whose side chains are predicted to contribute to ligand recognition are colored magenta. In both panels ligand is shown in silver (carbon) stick representation.