A carbohydrate-binding family 48 module enables feruloyl esterase action on polymeric arabinoxylan

Feruloyl esterases (EC 3.1.1.73), belonging to carbohydrate esterase family 1 (CE1), hydrolyze ester bonds between ferulic acid (FA) and arabinose moieties in arabinoxylans. Recently, some CE1 enzymes identified in metagenomics studies have been predicted to contain a family 48 carbohydrate-binding module (CBM48), a CBM family associated with starch binding. Two of these CE1s, wastewater treatment sludge (wts) Fae1A and wtsFae1B isolated from wastewater treatment surplus sludge, have a cognate CBM48 domain and are feruloyl esterases, and wtsFae1A binds arabinoxylan. Here, we show that wtsFae1B also binds to arabinoxylan and that neither binds starch. Surface plasmon resonance analysis revealed that wtsFae1B's Kd for xylohexaose is 14.8 μm and that it does not bind to starch mimics, β-cyclodextrin, or maltohexaose. Interestingly, in the absence of CBM48 domains, the CE1 regions from wtsFae1A and wtsFae1B did not bind arabinoxylan and were also unable to catalyze FA release from arabinoxylan. Pretreatment with a β-d-1,4-xylanase did enable CE1 domain-mediated FA release from arabinoxylan in the absence of CBM48, indicating that CBM48 is essential for the CE1 activity on the polysaccharide. Crystal structures of wtsFae1A (at 1.63 Å resolution) and wtsFae1B (1.98 Å) revealed that both are folded proteins comprising structurally-conserved hydrogen bonds that lock the CBM48 position relative to that of the CE1 domain. wtsFae1A docking indicated that both enzymes accommodate the arabinoxylan backbone in a cleft at the CE1–CBM48 domain interface. Binding at this cleft appears to enable CE1 activities on polymeric arabinoxylan, illustrating an unexpected and crucial role of CBM48 domains for accommodating arabinoxylan.

Feruloyl esterases (EC 3.1.1.73), belonging to carbohydrate esterase family 1 (CE1), hydrolyze ester bonds between ferulic acid (FA) and arabinose moieties in arabinoxylans. Recently, some CE1 enzymes identified in metagenomics studies have been predicted to contain a family 48 carbohydrate-binding module (CBM48), a CBM family associated with starch binding. Two of these CE1s, wastewater treatment sludge (wts) Fae1A and wtsFae1B isolated from wastewater treatment surplus sludge, have a cognate CBM48 domain and are feruloyl esterases, and wtsFae1A binds arabinoxylan. Here, we show that wtsFae1B also binds to arabinoxylan and that neither binds starch. Surface plasmon resonance analysis revealed that wtsFae1B's K d for xylohexaose is 14.8 M and that it does not bind to starch mimics, ␤-cyclodextrin, or maltohexaose. Interestingly, in the absence of CBM48 domains, the CE1 regions from wtsFae1A and wtsFae1B did not bind arabinoxylan and were also unable to catalyze FA release from arabinoxylan. Pretreatment with a ␤-D-1,4-xylanase did enable CE1 domain-mediated FA release from arabinoxylan in the absence of CBM48, indicating that CBM48 is essential for the CE1 activity on the polysaccharide. Crystal structures of wtsFae1A (at 1.63 Å resolution) and wtsFae1B (1.98 Å ) revealed that both are folded proteins comprising structurally-conserved hydrogen bonds that lock the CBM48 position relative to that of the CE1 domain. wtsFae1A docking indicated that both enzymes accommodate the arabinoxylan backbone in a cleft at the CE1-CBM48 domain interface. Binding at this cleft appears to enable CE1 activities on polymeric arabinoxylan, illustrating an unexpected and crucial role of CBM48 domains for accommodating arabinoxylan.
Arabinoxylans (AXs) 4 are hemicellulose components of vascular plant cell walls. AXs are abundantly present in many key lignocellulosic biomass feedstocks and cereal-processing residues, such as wheat bran, distiller's dried grains, and brewer's spent grain. Enzymatic modification and degradation of AXs are important in several cereal processes, and efficient bioconversion of AXs is also imperative in the development of new sustainable biomass and biorefinery processes ranging from production of biofuels and biobased materials to xylan-based prebiotic food and feed ingredients (1)(2)(3). The AX backbone is composed of ␤-1,4-linked xylopyranose (Xylp) residues that may be acetylated, substituted with glucuronic acid, and/or single-substituted with either ␣-L-1,2or ␣-L-1,3-linked arabinofuranosyl (Araf) moieties or double-substituted with both ␣-L-1,2-and ␣-L-1,3-Araf. The amount and pattern of the Araf substitutions vary among species and also their tissue, e.g. single ␣-L-1,3-Araf substitutions are predominant in grass (monocot) cell walls, whereas double substitutions are also present in the endosperm. In contrast, dicots predominantly have single ␣-L-1,2-Araf substitutions (4), although double substitutions have been observed in flax mucilage (5), and single ␣-L-1,3-Araf substitutions have been found in psyllium seeds (6). The Araf moieties can be further substituted in grasses and also in other plants with e.g. 5-O-ferulic acid (FA) and other hydroxycinnamic acids (4).
Complete degradation of AX requires a battery of enzymes, including feruloyl esterases (7). Feruloyl esterases catalyze the hydrolysis of ester bonds between FA and arabinose (8). They are found in both bacteria and fungi and are grouped in carbohydrate esterase family 1 (CE1) in the Carbohydrate-Active EnZyme database (CAZy) ((www.cazy.org) 5 (9). All CE1 feruloyl esterases with known structure share the ␣/␤-hydrolase fold of a central ␤-sheet flanked by ␣-helices (8). The catalytic triad is located in a hydrophobic binding pocket that is often capped by a flexible lid (10 -13).
Carbohydrate-binding modules (CBMs) are noncatalytic, individually-folded domains that are appended to the catalytic enzyme module via a linker. Like the carbohydrate-active enzymes, the CBMs are categorized into families in the CAZy database (9). CBMs have been shown to be important for the functionality of some catalytic domains by bringing the enzymes into contact with their polymeric substrate; this contact can increase both the hydrolytic rate of the enzyme and the overall conversion rate of substrate because the effective concentration of the enzyme-substrate complex is increased (14 -17). Generally, CBMs are small domains consisting of ϳ100 -150 residues and are often composed of a ␤-sandwich fold (18). Recently, CE1 feruloyl esterases with appended CBMs annotated as members of carbohydrate-binding module family 48 (CBM48) were identified in two metagenomics studies of beaver and moose droppings and wastewater treatment surplus sludge, which are not considered particularly rich in starch (19,20). CBM48, however, is usually associated with starch binding (21), but starch polysaccharides are not known to contain hydroxycinnamic acids. We therefore hypothesized that these newly discovered CBM48s represent a novel type of CBM48 that might enable the feruloyl esterase to act on polymeric nonstarch substrates and thus positively affect the function and kinetics of the feruloyl esterase. Another albeit less likely idea is that the CE1 feruloyl esterases identified would have other hitherto unknown non-FA-ester linkage targets. We selected feruloylated AX as a primary substrate type to investigate the hypothesis that the two newly-discovered CE1 feruloyl esterases and their CBM48 appendages would in fact act on AXs. We also determined the crystal structures of these two CE1 feruloyl esterases and their appended CBM48 domains in order to examine the structure-function relations of such action. The objective of this study was therefore to resolve the structure-function relations of two CE1s and their cognate CBM48s identified in a metagenomic study of anaerobic digesters fed with surplus sludge from wastewater treatment (19). Hence, in this study, two CE1 feruloyl esterases with CBMs classified as belonging to CBM48, and with truncations lacking the CBM48 domains, were characterized to demonstrate their specificity for AX and that the CBM48 domain is essential for activity on polymeric AX. The crystal structures of the two CE1 feruloyl esterases, wtsFae1A and wtsFae1B, were solved and subjected to docking with a feruloylated arabinoxylooligosaccharide (AXOS) to identify the putative binding site particularly for the AX backbone. Additionally, molecular dynamics analysis was performed to obtain an understanding of the rigidity of the CE1 and CBM48 domains. Finally, the relationship of these CBMs to known starch-binding CBMs was investigated to reveal whether these CBM48s could belong to a new, nonstarch-binding CBM48 CAZy subfamily.

Sequence analysis and expression
Both wtsFae1A and wtsFae1B were identified in a metagenomic study of an anaerobic digester fed with surplus wastewater treatment sludge (wts is an abbreviation for wastewater treatment sludge) (19). wtsFae1A is the N-terminal part of a chimeric enzyme we previously studied; the larger chimeric, or triple, enzyme, in addition to the terminal wtsFae1A, also has a glycoside hydrolase family 62 ␣-L-arabinofuranosidase (EC 3.2.1.55) and a glycoside hydrolase family 10 ␤-1,4-D-xylanase (EC 3.2.1.8) (22). wtsFae1A and wtsFae1B are 343 and 386 amino acids in length, including 28 and 24 predicted signal peptide residues, respectively. The molecular masses of mature wtsFae1A and wtsFae1B, including their cognate CBMs, were calculated to be 39 and 43 kDa, respectively, and the migration of the bands in the SDS-polyacrylamide gel corresponded to these molar masses (Fig. S1). Previous attempts to purify recombinant truncated wtsFae1A that lacks CBM48 (wtsFae1A⌬CBM48) failed due to precipitation (22). This precipitation was avoided, however, by maintaining wtsFae1A⌬CBM48 in the elution buffer from the His-tag purification step.
Domain analysis using dbCAN (23) suggested that wtsFae1B contains an N-terminal CBM48 and a C-terminal CE1 domain; previously, this domain organization has been suggested for wtsFae1A, but the family of the CBM was unknown (22). wtsFae1A and wtsFae1B display 43% pairwise sequence identity with each other (see Fig. S2 for multiple alignment), and both share the highest sequence identity with XynZ from Hungateiclostridium thermocellum. When compared with other structure-determined CE1 feruloyl esterases (see Fig. S2 for multiple alignment), their identity was only 39 and 38%, respectively.

Substrate specificity and kinetics
The ability of wtsFae1A and wtsFae1B to bind to wheat starch and insoluble wheat arabinoxylan (WAX-I), respectively, was investigated by adsorption assays. The data demonstrated that none of the enzymes had affinity for wheat starch because no binding could be observed. The apparent binding affinity (K d, app ) of wtsFae1A for WAX-I was previously determined to be 0.204 Ϯ 0.017 mg ml Ϫ1 (22), whereas wtsFae1B displayed a K d, app of 1.3 Ϯ 0.16 mg ml Ϫ1 for WAX-I (Fig. 1A). To further investigate the substrate-binding properties of wtsFae1A and wtsFae1B, they were subjected to surface plasmon resonance (SPR) analysis using xylobiose, xylotriose, xylotetraose, , maltotetraose, maltohexaose, and ␤-cyclodextrin as binding targets. Unfortunately, SPR analysis for wtsFae1A did not yield meaningful data, which most likely reflected the enzyme's poor stability when immobilized on the SPR-analysis chip. However, wtsFae1Bdisplayedweakaffinityfortheabove-mentionedoligosaccharides except for the starch analogs ␤-cyclodextrin and maltohexaose. However, a meaningful K d value was obtained for xylohexaose (Fig. 1B) and found to be 14.8 M.
Previously, wtsFae1A was shown to have displayed minimal activity against WAX-I (22). But by using higher enzyme concentrations, which was achieved by redissolution of crystals formed during storage at 4°C, wtsFae1A was shown to catalyze release of FA from WAX-I (Table 1). wtsFae1B also catalyzed release of FA from WAX-I (Table 1), but no such activity was detected for wtsFae1A⌬CBM48 and wtsFae1B⌬CBM48 (i.e. the enzymes devoid of the CBM) ( Table 1). WAX-I may contain ferulate dehydrodimers (di-FAs) (4), but none of the enzymes released any di-FA from WAX-I. Interestingly, wtsFae1A⌬CBM48 and wtsFae1B⌬CBM48 displayed about 6-and 4-fold higher specific activity, respectively, than the corresponding full-length enzymes toward soluble ferulated AXOS obtained by pretreating WAX-I with a glycoside hydrolase family 10 (GH10) ␤-D-1,4-xylanase (EC 3.2.18) ( Table 1). The same trend was observed for the enzymes and truncated forms assayed on 5-O-trans-feruloyl-␣-L-Araf (Table 1).

Overall structures of wtsFae1A and wtsFae1B
The crystal structures of wtsFae1A and wtsFae1B were determined to 1.63 and 1.91 Å resolution, respectively, and the dataprocessing statistics are summarized in Table 2. The space group for both crystals was P2 1 2 1 2 1 with two molecules in the asymmetric units forming noncovalent homodimers (Fig. 2, A and B). The interfacial surface area of wtsFae1A and wtsFae1B was 1675 and 1667 Å 2 , respectively (Fig. 2, A and B). According to PISA analyses (24), these observed interactions are strong enough to be of biological relevance; ⌬G was estimated to be Ϫ19.9 and Ϫ22.1 kcal mol Ϫ1 . Furthermore, both enzymes eluted from a gel-filtration column at a volume consistent with a dimer in solution (Fig. S3), which suggests that the enzymes are dimers in their active form.
The root-mean-square deviation (r.m.s.d.) for the C␣ atomic coordinates of wtsFae1A and wtsFae1B, chains A and B, were 0.2 (290 atom pairs) and 0.1 Å (276 atom pairs), respectively. Overall, the electron density is well-defined in both enzymes (Fig. 2, C and D), with the exception of residues 198 -203 in wtsFae1A chain B and 223-237 and 300 -310 in wtsFae1B chain A and B, where such density is missing (Fig. S4, A-F), which suggests these residues constitute a flexible loop. Furthermore, electron density for the first 22 residues of wtsFae1B is missing.
The CE1 and CBM48 domains appear to interact extensively with one another, with an estimated interfacial surface area of 336.4 Å 2 for wtsFae1A, and for wtsFae1B, it is significantly lower (257.5 Å 2 ). Nevertheless, for both wtsFae1A and wtsFae1B, the data indicate that multiple hydrogen bonds contribute to fixing the position of CBM48 relative to the CE1 domain (Fig. 3, A and B; Table 3). Interestingly, three residues (wtsFae1A: Asp 106 , Lys 109 , and Val 111 ; wtsFae1B: Asp 119 , Lys 124 , and Val 127 ) that participate in hydrogen bonds in the linker connecting the CE1 and CBM48 domains are structurally-conserved (Fig. 3A). However, these residues are not conserved among wtsFae1A and wtsFae1B homologs (Fig. 3B). Additionally, several hydrogen bonds formed by peptide backbone atoms are present in the structurally-conserved helical area of the loop connecting the CE1 and CBM48 domains (  Table 3). Moreover, potential hydrogen bonds, with one in wtsFae1A (between Asn 90 and Gly 158 ) and two in wtsFae1B (between Arg 102 and Asp 175 and between Arg 102 and Asp 177 ), diametrically opposed the linker connecting the two domains may also be involved in forming the rigid CE1 and CBM48 integral units (Fig. 3, A and B; Table 3). None of these residues are conserved among the CE1-CBM48 homologs (Fig.  3B). Differential scanning calorimetry (DSC) of wtsFae1B showed a single unfolding event with T m at ϳ61°C (Fig. S4), which was also found previously for wtsFae1A (22), which corroborates the view that the CE1 and CBM48 domains form an integrated unit.

Arabinoxylan-binding CBM48
Inspection of the structures shows that the core ␣/␤-hydrolase fold and position of the catalytic triad is structurally-conserved in wtsFae1A and wtsFae1B (Fig. 2, A and B). However, whereas the serine (catalytic general base) (Ser 242 for wtsFae1A and Ser 272 for wtsFae1B) and histidine (general acid-base catalyst) (His 325 for wtsFae1A and His 368 for wtsFae1B) are conserved, the general acid is Glu 296 for wtsFae1A and Asp 339 for wtsFae1B (Fig. 2, A and B; see Fig. S2 for multiple alignment). Both Glu and Asp have been commonly observed as the key acid catalysis amino acids in feruloyl esterases (8). However, a number of loop regions potentially involved in substrate binding differ significantly among the feruloyl esterases (Fig. 4, A-D).

Ferulic acid-binding pocket flexibility
The electron density is missing in two loops surrounding the active sites of both enzymes, indicating that these regions are flexible. Fortunately, in one molecule (chain A of wtsFae1A), all loops were well-defined, which allowed us to perform a molecular dynamics simulation to obtain a better understanding of the dynamics of these loops. A 200-ns molecular dynamics simulation of wtsFae1A chain A showed that the core structures of both the CE1 and CBM48 domains are rigid and do not undergo noticeable movements (Fig. 5). However, the plot of the root mean square fluctuation as a function of the C␣ shows that one region (residue 198 -207) in particular is apparently involved in concerted fluctuations (Fig. 5) that could result in the formation of a lid on the otherwise open active site in the apo structure ( Fig. 2A). Thus, this flexible loop could act as a hinge promoting substrate binding and thus catalysis. Additionally, the C␣ atoms of the residues around residue 300 close to the active site undergo fluctuations (Fig. 5), suggesting that a re-arrangement of the regions surrounding the active site could take place upon substrate binding. A multiple alignment of structure-determined CE1 feruloyl esterases showed that the residues constituting the flexible loops of Table 1 Specific activities (milliunits mg ؊1 ) of CE1 feruloyl esterases wtsFae1A and wtsFae1B on different types of substrates Data are given as averages of duplicate measurements Ϯ S.D. wtsFae1A⌬CBM48 and wtsFae1B⌬CBM48 are abbreviations for the two CE1 feruloyl esterases devoid of their CBM.

Arabinoxylan-binding CBM48
XynZ and wtsFae1A are not conserved at the sequence level (Fig.  6). Furthermore, this analysis showed that presumably the flexible loop of wtsFae1B with its 36 residues differs significantly from that of the other CE1 FAEs (Fig. 6). Concerted movements of the regions surrounding the active site have also been suggested for several other CE1 feruloyl esterases when FA is not present, including the closest structural homolog AmFae1A (10), the closest sequence homologs XynZ (11), and BiFae1A and EstE from the rumen bacteria Bacteriodetes intestinalis (12) and Butyrivibrio proteoclasticus (13), respectively. However, the topology of the FA-binding pocket lid varies significantly between the CE1 FAEs: in AmFae1A, a ␤-clamp closes the FA-binding pocket, and ␣-helix 2 is slightly extended compared with wtsFae1A (Fig. 4A); in BiFae1A, an ␣-helix not present in wtsFae1A forms a lid on top of the FAbinding pocket (Fig. 4B); in Est1E, ␣-helices and loops not present in wtsFae1A form a flexible hinge on top of the FA-binding pocket (Fig. 4C). The FA-binding pocket of XynZ is most similar to that of wtsFae1A, and if the flexibility of the regions surrounding the FA-binding pocket is taken into account, the FA-binding pockets of these two enzymes are practically identical (Fig. 4D). XynZ and wtsFae1A are structurally very similar (r.m.s.d. 0.7 Å for 153 C␣ atom pairs).

Role of CBM48 in relation to substrate distortion to assist catalysis
To determine the role of the CBM48, we docked

Arabinoxylan-binding CBM48
Xylp] (XA 5f2 X) (Fig. 7A) to wtsFae1A chain A (Fig. 7B). As mentioned above, the electron density is unfortunately missing in certain regions near the active site of wtsFae1B chains A and B and wtsFae1A chain B, which precludes meaningful docking experiments for these. When compared with XynZ in complex with FA (PDB code 1JT2), a slight shift in the position of the FA on XA 5f2 X could be observed (Fig. 7B). However, the displacement of the flexible loop and ␣-helix 6 ( Fig. 4D) and the fact that FA on XA 5f2 X is constrained by its attachment to the AXOS may cause the observed differences. Furthermore, the stacking interaction with Trp 157 (Fig. 7B) and Ile 90 in XynZ may also contribute to the altered orientation of FA. A, structurally-conserved residues involved in hydrogen bonds forming the rigid wtsFae1A (cyan) and wtsFae1B (green) structures (hydrogen bonds are shown as yellow dashed lines with their length given in Å, and the residues involved are shown as sticks). B, multiple alignment of CE1-CBM48 homologs (see Fig. S6 for complete alignment). The asterisks indicate the residues involved in hydrogen bonds keeping the CE1 and CBM48 domains in the correct relative orientation (wtsFae1B above and wtsFae1A below the multiple alignment). The protein sequences are identified by their GenBank TM accession numbers. The multiple alignment is visualized using ESPript 3.0 (57).

Arabinoxylan-binding CBM48
The docking of XA 5f2 X to wtsFae1A chain A further suggests that the xylan main chain is accommodated in the cleft formed at the interface between the CE1 and CBM48 domains (Fig. 7B). Interestingly, hydrogen bonds are only formed between the Xylp moieties and residues on the CBM48 (Fig. 7B), which are not conserved among the CE1-CBM48 homologs (Fig. S6). Furthermore, no stacking interactions with Xylp moieties were observed, but Araf forms a hydrogen bond with the conserved general acid-base catalyst His 325 (Fig. 7B).
Surprisingly, the active sites and binding clefts of wtsFae1A and wtsFae1B display very distinct topology and properties (Fig. 8, A-E). The binding cleft and active-site pocket of wtsFae1A is negatively charged, whereas that of wtsFae1B is neutral (Fig. 8, A and B). Interestingly, in wtsFae1B no aromatic residues are present, which is similar to Trp 157 in wtsFae1A, which can form a stacking interaction, but Phe 340 in wtsFae1B may be able to form a stacking interaction with FA (Fig. 8C). We suggest that the role of both these aromatic residues may be to distort and pull the FA moiety toward them and thus destabilize the 5-O-linkage to Araf.
Similarly to what has been reported for other CE1 feruloyl esterases, the FA-binding pockets of both wtsFae1A and wtsFae1B are hydrophobic, whereas the binding clefts accommodating the xylan main chain are not (Fig. 8, D and E). This is surprising because many carbohydrates form stacking interactions with aromatic residues at binding sites in carbohydrateactive enzymes.

Relation of feruloyl esterase CBMs to starch binding CBMs
Structural analyses of the Protein Data Bank (26) using the DALI server (27) revealed that the closest structural homolog to the wtsFae1A and wtsFae1B CBM domains is CBM48 (residues 253-338) appended to the starch phosphatase Starch Excess4

Arabinoxylan-binding CBM48
from Arabidopsis thaliana (AtSEX4) (PDB code 4PYH) (28). The C␣ r.m.s.d. between AtSEX4 CBM48 and chain A of wtsFae1A and wtsFae1B CBM48s was 3.2 (43 atom pairs) and 6.2 Å (48 atom pairs), respectively (see Fig. 9, A-C, for structural alignment). The C␣ r.m.s.d. between chain A wtsFae1A and wtsFae1B CBM48s was 1.3 Å (52 atom pairs) (see Fig. 9, A-C, for structural alignment). A superimposition of AtSEX4 in complex with maltohexaose and of wtsFae1A with XA 5f2 X showed that the catalytic domains of the two enzymes are fixed to the CBM48 domains at  A, schematic drawing of XA 5f2 X (xylopyranosyl moieties, black; arabinofuranosyl moiety, green; ferulic acid, purple), and B, wtsFae1A chain A (cyan) docked to XA 5f2 X (yellow) and ferulic acid from H. thermocellum XynZ (PDB code 1JT2) (white) superimposed. wtsFae1A chain residues interacting directly with XA 5f2 X and Trp 157 are labeled, and hydrogen bonds are shown as dotted lines (yellow), and their length is given in Å.

Arabinoxylan-binding CBM48
different angles; however, the substrates interact with the CBM48 domains in the same region (Fig. 9, B-D). The superimposition also revealed that AtSEX4 is unable to accommodate the XA 5f2 X (Fig. 9C), whereas wtsFae1A can accommodate the maltohexaose (Fig. 9D). Maltohexaose interacts with Trp 278 , Lys 307 , Trp 314 , His 330 , and Asn 332 on the AtSEX4 CBM48 (28) that all been shown to be important for maintaining both binding and activity (28,29). None of these residues are structurally-conserved in wtsFae1A (Fig. S7). It is particularly interesting that Trp 278 and Trp 314 , which form a conserved aromatic platform at the binding site on CBM48s appended to starch-acting enzymes (21), are missing in wtsFae1A and wtsFae1B (Fig. S8 for multiple alignment). The multiple alignment of the closely-related starch-binding CBM20, CBM48, and CBM69 (21,30) and homologs of the CBMs of wtsFae1B and wtsFae1A (see Fig. S8 for multiple alignment) was used to construct a maximum-likelihood phylogenetic tree that showed that the CBMs appended to wtsFae1A and wtsFae1B belong to the CBM48 family (Fig. 10). Interestingly, the CBM48s appended to wtsFae1A and wtsFae1B and the homologs cluster with three CBM48s appended to starch-acting enzymes that also lack the aromatic platform at the binding site ( Fig. 10; Fig. S8). Hence, it is questionable whether these socalled starch-binding CBM48 domains actually bind onto starch. Based on this analysis, we propose that the CBM48 modules identified in this study may constitute a separate CBM48 sub-family not belonging to the group of CBM48 starch-binding modules.

Discussion
The recent major efforts in metagenomics and genomics continue to reveal genes encoding novel carbohydrate-active enzymes (31). Enzymes that often differ from previously characterized enzymes in domain organization or differ sufficiently at the sequence level suggest that these enzymes are capable of acting on substrates not previously seen within specific enzyme families (31). Two studies have revealed CE1s with appended CBMs annotated as CBM48s (19,20), and similar enzymes were present in GenBank TM (32); however, their CBMs, as for wtsFae1A, were not annotated as CBM48s (22). The presence of a CBM suggests that these feruloyl esterases are capable of acting on complex, polymeric substrates and potentially also on insoluble substrates, a capability that would hold vast potential in conversion of recalcitrant lignocellulosic biomass fractions (10).
Unfortunately, the commercially available polymeric feruloyl esterase substrates are limited to WAX-I in which only a small fraction of the Araf is ferulated (33). Despite this, it is clear that wtsFae1A and wtsFae1B depend on their cognate CBM48 to act on WAX-I (Table 1). Hence, both wtsFae1A and wtsFae1B could potentially act on more complex substrates and thus be tools for unlocking the potential of the recalcitrant lignocellulosic biomass fractions.
The crystal structures of wtsFae1A and wtsFae1B reveal CE1 domains that, although similar to known CE1 domains at the fold level, have a significantly different active-site topology, in particular with respect to the loops forming a lid on top of the active sites (Figs. 4, A-D, and 6). These differences have been suggested to impact the enzymes' ability to accommodate mono-, di-, tri-, and tetra-FAs (10,34,35) that exist in planta (36). The FA-binding pocket of wtsFae1A resembles that of XynZ from H. thermocellum (Fig. 4D), which is exposed and thus can accommodate both mono-and di-FAs (10). Surprisingly, no di-FA was released from WAX-I by either wtsFae1A or wtsFae1B. However, wtsFae1B also has a significantly longer loop, presumably forming a lid on top of the active site (Fig. 6), which may impact the substrate specificity.
The catalytic domains of wtsFae1A and wtsFae1B and CBM48 form a rigid integral unit, and the main chain of the

Arabinoxylan-binding CBM48
substrates binds at a cleft formed at the interface of the two domains (Figs. 2, A and B, and 7B). Our DSC, crystallographic, and molecular dynamics data suggest that the wtsFae1A and wtsFae1B domains form a rigid structure (Fig. 5), which is kept in place by conserved hydrogen bonds (Fig. 3, A-C). Our docking data suggest that the CE1 and CBM48 domains act in consort, with the CBM48 responsible for binding the Xylp moieties, whereas the CE1 integrates the Araf-FA in the catalytic site for hydrolysis (Fig. 7B). This is supported by the complete loss of activity for wtsFae1A⌬CBM48 and wtsFae1B⌬CBM48 toward WAX-I (Table 1). This result was a bit surprising because two CE1 feruloyl esterases from B. intestinalis that lack a CBM have been reported to release FA from WAX-I (12). However, the lack of interactions between the CE1 domain and the XA 5f2 X main chain in the docking experiment suggests that the CE1 domain recognizes only the FA and potentially also the Araf moiety. This is in line with what has been observed in other feruloyl esterase crystal structures where FA only is observed despite being linked to an AXOS (11,37). One can speculate that the helical structure of xylan (38) somehow prevents these feruloyl esterases from accommodating the FA in the activesite pocket. The weak binding observed for the shorter xylooligosaccharides and AXOSs compared with xylohexaose implies that a minimum of six Xylp moieties are required for productive binding. Surprisingly, an increase in activity toward soluble ferulated AXOS and 5-O-trans-feruloyl-␣-L-Araf was observed for wtsFae1A⌬CBM48 and wtsFae1B⌬CBM48 compared with the full-length enzymes. The reason for this remains elusive; however, perhaps the removal of the CBM grants better access to the active-site pocket for small, soluble substrates.
Overall, wtsFae1A and wtsFae1B are structurally very similar (Fig. 2, A and B), but the properties and the topology of the active-site pocket and the binding cleft are very different (Fig. 8,  A-E). This may suggest that they act on different substrates in Nature. The negatively charged binding cleft of wtsFae1A (Fig.  8, A and B) could be a hint that negatively charged substrates like the glucuronic acid stretches of glucuronoarabinoxylan cannot be accommodated, although this is not the case for wtsFae1B. Furthermore, both the differences in length of the flexible loops that potentially form a lid on the active-site pocket (Fig. 6) and the different positions of the aromatic residues that potentially form a stacking interaction with the FA (wtsFae1A Trp 157 ; wtsFae1B Phe 340 ) (Fig. 8C) also suggest differences in specificity. BiFae1A from the B. intestinalis (12) and EstE from B. proteoclasticus (13) both have an aromatic residue, namely Trp 67 and Phe 33 , respectively, that structurally resembles that of wtsFae1A. As already mentioned, XynZ from H. thermocellum (11) and also AmFae1A from A. mucronatus (10) do not have aromatic residues that form stacking interactions with the FA. The flexibility of the active site seems to be common for feruloyl esterases (10,12,13). The wtsFae1A and wtsFae1B homodimers probably do not affect the activity of the enzymes because this would require that the helical xylan structure be turned 180°.
The phylogenetic tree of the three starch-binding CBM families, CBM20, CBM48 and CBM69, and the CBMs homologous to the CBMs appended to wtsFae1A and wtsFae1B unambiguously support the classification of these CBMs as CBM48s (Fig. 10). Our binding data thus suggest that family CBM48 not only contains starch binding CBMs but also

Arabinoxylan-binding CBM48
xylan-binding ones. The phylogenetic tree also shows that the xylan-binding CBM48s cluster with the CBM48s appended to starch-acting enzymes that lack the starchbinding site (Fig. 10). Unfortunately, no binding data for these starch-acting enzymes have been published. However, we did demonstrate weak binding to maltotetraose for wtsFae1B. Hence, the starch-active and -related enzymes lacking an aromatic platform at the starch-binding site might bind to short maltooligosaccharides, which would not be expected for the maltooligosyltrehalose trehalohydrolase, glycogen-debranching enzyme, and branching enzyme function. The relation to CBM48 for the wtsFae1A and wtsFae1B CBMs is further strengthened by the structural similarity to the CBM48 appended to AtSEX4 (Fig. 9, A-D).
This study provides the first structural characterization of CE1 enzymes appended to a CBM and adds to our understanding of the functional role of CBMs. The CBMs were identified as belonging to CBM family 48. Yet, our findings also hint at the need for a renewed view of CBM48 functionality and classification. In addition, crystal structures enabled us to perform a molecular dynamics simulation showing that the two domains form a rigid structure and a docking study showing that the two domains act in consort. Both wtsFae1A and wtsFae1B displayed activity toward WAX-I, which was lost when CBM48 was not present. Altogether, the combined results of this study present feruloyl esterases as a potential means to advance our exploitation of plant biomass.

Genes, cloning, expression, and purification
An ORF encoding a CE1 with a CBM48 appended (wtsFae1B) (GenBank TM accession no. BK010854.1) was identified in a metagenomic study of the anaerobic digester Randers (Whole Genome Shotgun accession no. MTKZ00000000) on contig 6388, bp 3157-4389 (19). wtsFae1B was analyzed for the presence of a signal peptide by SignalP 4.1 (39). Disulfide bonds were predicted by DiANNA 1.1 (40). Molecular mass and pI values were predicted by Compute pI (41), and pI was 8.82, and molecular mass was 42.7 kDa. The theoretical molar extinction coefficient, calculated using ProtParam (http://web.expasy. org/protparam) 5 , was 59820 M Ϫ1 cm Ϫ1 . wtsFae1A is part of a The alignment used to construct the phylogenetic tree is shown in Fig. S8. The phylogenetic tree is visualized using iTOL (58).

Affinity for wheat arabinoxylan
WAX-I (Megazyme) and wheat starch (Sigma) was washed three times in 50 mM NaOAc, pH 6. The ability of wtsFae1A and wtsFae1B to bind wheat starch was determined by mixing 5 l of enzymes (wtsFae1A 4 M; wtsFae1B 1.35 M) and 1 mg ml Ϫ1 95 l of WAX-I and wheat starch in 50 mM NaOAc, pH 6, and 0.005% (w/v) BSA, incubated in triplicate at 4°C for 30 min, and centrifuged (20,000 ϫ g, 4°C, 10 min). Protein concentrations in the supernatants were determined from A 280 readings. Similarly, the K d, app was obtained for wtsFae1B and WAX-I by

Surface plasmon resonance
wtsFae1B was diluted in 10 mM sodium acetate, pH 4.5, to 0.1 mg ml Ϫ1 , prior to amine coupling to 2872.6 resonance units (RU) onto a CM5 chip (GE Healthcare) for SPR analysis (BIAcore T100, GE Healthcare). Sensorgrams (RU versus time) were recorded at 25°C in running buffer (10 mM sodium acetate, 150 mM NaCl, 0.005% Tween 20, pH 6) at a flow rate of 30 l min Ϫ1 with 120 s contact time followed by 120 s dissociation and were baseline-corrected by subtracting data from a parallel flow cell without enzyme. The ability of wtsFae1B to bind 1 mM ␤-cyclodextrin, maltotetraose, maltohexaose, xylobiose, -triose, -tetraose, -pentaose, -hexose, A 3 X, and A 2 XX was determined. [ligand] is the oligosaccharide concentration, and R max is the maximum binding capacity in RU.

Arabinoxylan-binding CBM48
Differential scanning calorimetry DSC was used to analyze the conformational stability of wtsFae1B using a Nano DSC calorimeter (TA Instruments). Protein samples (1 mg ml Ϫ1 ) were dialyzed against 200 volumes of 10 mM sodium acetate, pH 6, for 24 h, degassed, and loaded into sample cells and scanned (25-90°C, 1°C min Ϫ1 ) with the dialysis buffer in the reference cell. Baseline scans, collected with buffer in both reference and sample cells, were subtracted from sample scans, and the Universal Analysis software (TA Instruments) with a DSC add-on was used to model the reference cell and baseline-corrected thermograms using a twostate scaled model to determine T m (unfolding temperature, defined as the temperature of maximum apparent heat capacity) and the calorimetric heat of unfolding ⌬H cal .

Crystallization, data collection, and data processing
wtsFae1A and wtsFae1B were crystallized in 48-well MRC Maxi plates (Jena Bioscience) by mixing 2 l of protein with 2 l of reservoir. A protein concentration of 80 mg ml Ϫ1 in 10 mM sodium acetate was used for wtsFae1B. wtsFae1A had spontaneously formed crystalline precipitate while stored at 4°C, and the saturated supernatant was diluted four times in 10 mM sodium acetate, pH 6, to give a concentration of 0.8 mg ml Ϫ1 before crystallization trials were performed with the resulting sample. PACT (Jena Bioscience) and MCSG-1 (Anatrace) commercial screens were used, and crystals were identified in several conditions for both proteins. Crystals were cryocooled in liquid nitrogen and tested at the ESRF beamlines ID30-A3 and ID23-2. Data were collected for wtsFae1B on a crystal from condition F9 in the MCSG-1 screen (50 mM ammonium sulfate, 50 mM Bis-Tris/HCl, pH 6.5, 30% (v/v) pentaerythritol ethoxylate (15/4_EO/OH)) where the drop containing the crystals had first been supplemented with PEG400 for cryoprotection. This dataset was processed to 1.9 Å resolution with autoPROC (see Table 2 for statistics) (42). Several crystals of wtsFae1A diffracted, but the best dataset was collected on a crystal from condition PACT F1 (20 w/v PEG 3,350, 100 mM Bistris propane, pH 6.5, 200 mM sodium fluoride). The useful resolution range for this dataset was underestimated, and data were collected on the square detector to a resolution of 1.8 Å on the edge of the detector, but during processing with EDNAproc (43), useful data to 1.63Å could be obtained. Although the completeness in the highest-resolution shell was only 68%, the CC 1/2 was above 0.4, which indicated a strong signal.

Structure solution, model building, and refinement
To solve the structure of wtsFae1A and wtsFae1B, a model was prepared from the closest homolog from the Protein Data Bank (www.pdb.org), which was the CE1 feruloyl esterase of the XynZ from H. thermocellum (PDB code 1JT2) identified through PD-BLAST (44) with a coverage around 65% and identity of around 35% to the two targets. Sculptor (45) from the PHENIX package (46) was used to generate a search model, and PHASER (47) was used to perform molecular replacement, searching for two molecules in the asymmetric unit in both cases, and revealed the space group to be P2 1 2 1 2 1 for wtsFae1B. Even though the search model only covered the CE1 domains, TFZ scores of 21.3 and 23.7 for wtsFae1B and wtsFae1A, respectively, were obtained. Initially, the structures were built with PHENIX.autobuild (48) and then refined with PHE-NIX.refine (49) and manual model rebuilding in Coot (50) to a final R work /R free of 0.17/0.20 and 0.18/0.21 for wtsFae1A and wtsFae1B, respectively.

Structural alignments, electrostatic plots, and hydrophobicity plots
Structural alignments were obtained using PyMOL 2.2 (Schrödinger, LLC, New York; also used for rendering structural models), and the overall r.m.s.d. for C␣ ranged from 0.152 to 0.188 Å. Electrostatic maps were obtained with the APBS plugin in PyMol 2.2 using default settings. Hydrophobicity plots were obtained using the color_h.py script in PyMOL 2.2 (Schrödinger, LLC, New York) and colored according to the Eisenberg hydrophobicity scale (51). jsPISA version 2.1.1 was used to calculate the interfacial surface areas and strength for the dimers and the CE1 and CBM48 domains (24).

Molecular dynamics
The molecular dynamics simulation was performed on wtsFae1A chain A in the Yasara Structure (18.4.24) with the built-in molecular dynamic simulation macro "md_run.mrc" using AMBER14 as a force field. The simulation cell was allowed to include 10 Å surrounding the protein and filled with water at a density of 0.997 g/ml. Initial energy minimization was carried out under relaxed constraints using steepest descent minimization. Simulations were performed in water at a constant pressure with temperature at 298 K. The systems were neutralized at pH 6 by counterions using 0.9% w/v NaCl. The simulation was run at constant pressure and 298 K for 100 ns. Data were collected every 250 ps. Snapshots were then analyzed using the built-in "md_analyzeres.mcr" macro for the r.m.s.d. of the C␣ of the ␣-helices and ␤-strands and the ligand heavy atoms as distances of the catalytic residues from the scissile bond of the ligand.  (Fig. 7A), was built using Yasara Structure (18.4.24) and subjected to steepest descent energy minimization prior to the docking studies. The docking studies were performed with wtsFae1A chain A using the Yasara Structure (18.4.24) with the built-in macro "dock_run.mrc" using the Autodock VINA (52) for 25 runs in the AMBER03 force field under default parameters. The simulation cell was set to 15 Å centered around Ser 242 and then manually adjusted to include the expected binding cleft. The results were compared with XynZ from H. thermocellum (11) in complex with FA (PDB code 5JT2) to identify the most probable positioning of the ligand, which was subsequently subjected to energy minimization using Yasara Structure (18.4.24).

Multiple alignments and phylogenetic analysis
A multiple alignment was built with the CBM20 and CBM48 protein sequences previously investigated by Janeček and co-

Arabinoxylan-binding CBM48
workers (21), CBM69 was from CAZy based on personal communication with Stefan Janeček and CBMs homologous to the CBM48s appended to wtsFae1A and wtsFae1B. The latter were obtained from a BlastP against the NR database (44) using the wtsFae1A and wtsFae1B CBM48 sequences as queries. The top 100 hits from each search were pooled, and a multiple alignment was constructed using MAFFT (53), which was subsequently manually trimmed using the wtsFae1A and wtsFae1B CBM48 sequences for guidance. To reduce redundancy and the number of CBM48 sequences, the CBM48 sequences were clustered at 50% similarity by using CD-Hit (54), which resulted in an additional 15 CBM48 sequences. The multiple alignment was used for building the LG maximum likelihood phylogenetic tree using RaxML-HPC BlackBox (version 8.2.10) (55) at the CIPRES Science Gateway version 3.3 (56) with 1000 bootstrap replications.
The multiple alignments of the CE1 domain alone (manually selected from the multiple alignment of the full-length enzymes) and of the full-length CE1-CBM48 enzymes were based on the above-obtained sequences and obtained using MAFFT (53).