Structural Analysis of Glucuronoxylan-specific Xyn30D and Its Attached CBM35 Domain Gives Insights into the Role of Modularity in Specificity*

Background: Xylanases are crucial in plant cell wall recycling. Results: A glucuronoxylan-specific xylanase is attached to its binding module with moderate flexibility. This CBM35 displays novel structural features regulating specificity. Conclusion: Depolymerization of highly substituted xylans and an oriented interaction with its target substrate are proposed. Significance: Unraveling the mechanisms ruling modularity is essential to understanding the biomass deconstruction and to producing efficient biocatalysts. Glucuronoxylanase Xyn30D is a modular enzyme containing a family 30 glycoside hydrolase catalytic domain and an attached carbohydrate binding module of the CBM35 family. We present here the three-dimensional structure of the full-length Xyn30D at 2.4 Å resolution. The catalytic domain folds into an (α/β)8 barrel with an associated β-structure, whereas the attached CBM35 displays a jellyroll β-sandwich including two calcium ions. Although both domains fold in an independent manner, the linker region makes polar interactions with the catalytic domain, allowing a moderate flexibility. The ancillary Xyn30D-CBM35 domain has been expressed and crystallized, and its binding abilities have been investigated by soaking experiments. Only glucuronic acid-containing ligands produced complexes, and their structures have been solved. A calcium-dependent glucuronic acid binding site shows distinctive structural features as compared with other uronic acid-specific CBM35s, because the presence of two aromatic residues delineates a wider pocket. The nonconserved Glu129 makes a bidentate link to calcium and defines region E, previously identified as specificity hot spot. The molecular surface of Xyn30D-CBM35 shows a unique stretch of negative charge distribution extending from its binding pocket that might indicate some oriented interaction with its target substrate. The binding ability of Xyn30D-CBM35 to different xylans was analyzed by affinity gel electrophoresis. Some binding was observed with rye glucuronoarabinoxylan in presence of calcium chelating EDTA, which would indicate that Xyn30D-CBM35 might establish interaction to other components of xylan, such as arabinose decorations of glucuronoarabinoxylan. A role in depolymerization of highly substituted chemically complex xylans is proposed.

The utilization of biomass as a renewable source of biofuels, chemicals, and added value products has increasing economical and environmental relevance (1,2). Biomass is composed mainly of plant cell walls, which contain a mixture of lignin and polysaccharides interlocked in a complex lignocellulose matrix. Among cell wall polysaccharides, xylan accounts for approximately one-third of the renewable organic carbon on earth (3,4). It is composed of a backbone of ␤-1,4-xylose residues, which can be decorated at O2 with 4-O-methyl-D-glucuronic acid and at O2 or O3 with arabinofuranose residues, which can also be esterified to ferulic acid. Additionally, xylan can also be extensively acetylated. The composition of xylan is highly variable depending on the plant source and tissue, and the term glucuronoxylan is frequently used to describe hardwood xylans (highly substituted with 4-O-methyl-D-glucuronic acid residues), whereas (glucurono)arabinoxylan denominates xylans from grasses and cereals (with a large amount of arabinose residues) (4,5). Biodegradation of xylan requires the coordinated activity of several enzymes, among which xylanases (1,4-␤-Dxylan xylanohydrolase; EC 3.2.1.8) catalyze the cleavage of internal linkages on the xylose backbone. They play a central role in xylan depolymerization and are key enzymes for xylan bioconversion to fermentable sugars and oligomers of potential value as chemicals and prebiotics (6,7). Biorefinery approaches to deconstruct the plant cell wall for the production of bioethanol and biomaterials have shown the relevance of improving cellulose accessibility to cellulases (8) and that removal of hemicelluloses can be more important than lignin removal (9). This evidences the important contribution of xylanases to the production of cellulosic ethanol from lignocelluloses with low lignin content, such as corn stove and other agricultural residues (10).
The chemical complexity and heterogeneity of xylan can account for the multiplicity of xylanases produced by microorganisms (3,11). They are grouped into glycoside hydrolase families based on amino acid sequence homologies and structural features within the CAZy database (12). Most characterized xylanases belong to families GH10 and GH11. They do not seem to be specialized for hydrolysis of a particular type of xylan, because they are able to degrade hardwood glucuronoxylans and arabinoxylans (11,13). A small group of bacterial xylanases with specificity for glucuronoxylan has been identified and characterized. These enzymes, which require methylglucuronic decorations for activity, belong to subgroup H of the GH30 family and are considered as glucuronoxylanases (14 -16). Opposite to substrate specificity shown by GH30 glucuronoxylanases, an arabinoxylan-specific xylanase has been reported (17). The enzyme belongs to family GH5 and seems the only reported example of xylanase with specificity for arabinoxylan. The occurrence of a variety of xylanases with differences in the substrate specificity and mode of action undoubtedly reflects the heterogeneity of plant xylan in natural habitats and contributes to its efficient degradation and utilization.
Glycoside hydrolases are frequently modular enzymes that contain catalytic modules joined by flexible linker sequences to carbohydrate binding modules (CBMs). 4 They direct the appended catalytic modules to their target substrates and potentiate the activity against the complex substrates of cell wall. Similar to catalytic modules, CBMs are also classified into families based on amino acid sequence homologies (CAZy (18,19). Many xylanases are modular enzymes containing one or more CBMs. These noncatalytic modules found in xylanases belong to different families and recognize a diversity of ligands, including xylans, xylose oligomers, and carbohydrates that are not substrates of the xylanases, such as cellulose, in close proximity to xylan in the plant cell wall (18,19). Analysis of family 35 CBMs suggested that they could direct the enzymes toward regions of the cell wall that are being actively degraded by pathogens (20). Additionally, the disruption of the substrate structure by CBMs of different families has been proposed in several reports (21,22). In addition to getting the catalytic modules in close proximity to their substrates, CBMs probably have a more complex role in polysaccharides degradation, with additional functions to enable the deconstruction of cell wall and the catalytic depolymerization of carbohydrates.
Glucuronoylanase Xyn30D belongs to the secretome of Paenibacillus barcinonensis (23), a powerful xylan degrading microorganism that shows a complex set of carbohydratases, including several GH10 and GH11 xylanases cloned and characterized (24 -26). Like the few examples of characterized glucuronoxylanases of family GH30, the enzyme shows requirement of methylglucuronic substitutions for catalysis and is not active on arabinoxylans (27). However, unlike the structurally characterized GH30 glucuronoxylanases from Bacillus subtilis and Erwinia chrysanthemi (28,29), which are single domain enzymes, Xyn30D is a modular enzyme. The glucuronoxylanase contains a carbohydrate binding module of the CBM35 family, which is rarely found in xylanases. Only recently a modular GH30 xylanase from Clostridium papirosolvens, containing a CBM6, has been sequenced and its catalytic domain crystallized, although the biochemical characterization of the enzyme has not been reported (PDB code 4FMV). In our study, we have purified and crystallized glucuronoxylanase Xyn30D and analyzed the three-dimensional structure of the full-length enzyme. Its ancillary CBM35 domain has also been expressed and crystallized, and its binding abilities have been investigated by soaking experiments. Additionally, binding of purified CBM35 to soluble xylans has been analyzed by affinity gel electrophoresis. Xyn30D has a unique GH30-CBM35 modular assembly. The results here presented contribute to deciphering the biochemical function of GH30 xylanases and the contribution of CBM35 to the efficiency of xylan depolymerization. Further studies will be required to ascertain the role of glucuronoxylan-specific xylanases and the function of appended CBM in degradation of xylan in natural habitats.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification-Construction of Xyn30D, Xyn30D-GH30, and Xyn30D-CBM35 expression vectors were previously described (27). The protein samples were purified from Escherichia coli BL21Star (DE3) recombinant cultures containing plasmids pET101Xyn30D, pET101Xyn30D-GH30, and pET28aXyn30D-CBM35, respectively. Exponential phase cultures (A 600 of 0.8) were induced with 0.5 mM isopropyl-␤-D-thiogalactopyranoside for 18 h at 303 K. Cells were collected and disrupted by French press. Concentrated extracts were subjected to immobilized metal affinity chromatography using HisTrap HP columns of 1 ml (GE Healthcare), on a fast protein liquid chromatography system (ÄKTA FPLC; GE Healthcare). Washing with 50 mM Tris-HCl, pH 8, 50 mM imidazole, and 500 mM NaCl was performed before an elution in a single step with 50 mM Tris-HCl buffer of pH 8, 500 mM NaCl, and 300 mM imidazole. The immobilized metal affinity chromatography elution fractions were concentrated with Centricon centrifugal filter units of 3-kDa molecular mass cutoff (Millipore) and loaded for a second polishing step in a gel filtration column (Tricorn Superdex 200 10/300 GL; GE Healthcare). Single injections of 1,800 l were made, and the protein was eluted with 20 mM Tris-HCl, pH 8, 150 mM NaCl. The purity of the protein was verified by SDS-PAGE (30).
Crystallization and Data Collection-Crystals of Xyn30D were grown as described before (31). For data collection, native crystals were transferred to cryoprotectant solutions consisting of mother liquor plus 30% (v/v) glycerol before being cooled to 100 K in liquid nitrogen. Diffraction data were collected using synchrotron radiation at the European Synchrotron Radiation Facility (Grenoble, France) on ID23-2 Beamline. Diffraction images were processed with iMOSFLM (32) and merged using the CCP4 package (33). Crystals of Xyn30D were indexed in the P3 2 2 1 space group with four molecules in the asymmetric unit and 60% solvent content within the unit cell.
Crystals from Xyn30D-CBM35 isolated module were obtained by mixing 1 l of 22 mg/ml protein solution (20 mM Tris, pH 8.0, and 150 mM NaCl) with 1 l of a solution containing 20% (w/v) PEG 6000, 0.2 M Ca 2 Cl, and either 0.1 M MES, pH 6.0, or 0.1 M Hepes, pH 7.0, and equilibrating by vapor diffusion at room temperature. Complexes of Xyn30D-CBM35 were obtained by the soaking technique using glucuronic acid (GlcA; Sigma, or aldouronic acid mixture purchased from Megazyme (containing aldotriouronic, aldotetraouronic, and aldopentaouronic acids, 2:2:1). The crystals were soaked for 30 min in solutions made of mother liquor plus 30 mM GlcA or a 1/100 dilution of the commercial aldouronic mixture. For data collection, all crystals were transferred to cryoprotectant solutions containing mother liquor supplemented with 20% (v/v) glycerol before being cooled to 100 K in liquid nitrogen. Crystals showing diverse habits belonged to P2 1 2 1 2 1 space group with one molecule in the asymmetric unit and 55% solvent content within the unit cell. Diffraction data for Xyn30D-CBM35 were collected in-house with a MAR345dtb (MarResearch) detector equipped with a rotating anode generator (MicroStar, Bruker) and Helios mirrors. X-ray data from the complexes were collected using different synchrotron sources at Desy (Germany) and Diamond (UK), on the beamlines given in Table 1. Diffraction images were processed with iMOSFLM (32) and XDS (34) and merged using the CCP4 package (33). A summary of data collection and data reduction statistics is shown in Table 1.
Structure Solution and Refinement-The structure of Xyn30D was solved by molecular replacement using the MOLREP program (36). The structures of glucuronoxylanase XynC from B. subtilis (PDB code 3GTN) and CBM35 from Amycolaptosis orientalis exo-chitosanase (PDB code 2VZQ) were used to prepare the search model using the program Chainsaw (37) and a protein sequence alignment of Xyn30D onto both templates. A partial solution containing four molecules of the Xyn30D catalytic domain in the asymmetric unit (a.u.) was found using reflections up to 3.0 Å resolution range and a Patterson radius of 40 Å. Fixing these partial model, a solution containing one additional Xyn30D-CBM35 molecule was found using a Patterson radius of 15 Å. Then a complete Xyn30D molecule was generated and used as template with a Patterson radius of 52 Å, finally leading to a solution including all the four Xyn30D molecules, which after rigid body fitting led to an R factor of 49%. Crystallographic refinement was performed using the program REFMAC (38) within the CCP4 suite with flat bulk solvent correction, maximum likelihood target features, and local noncrystallographic symmetry (NCS). Free R factor was calculated using a subset of 5% randomly selected structure-factor amplitudes that were excluded from automated refinement. Model building using the program COOT (39) was combined with several rounds of refinement leading to a model showing a continuous density for the whole polypeptide chain, excluding the linker region. However, the refinement was stuck at an R factor of 38% (R free ϭ 39), and lowering space group symmetry to P3 2 , with eight independent molecules in the a.u., was necessary to allow further progress to an acceptable agreement (R factors of 25/29%). Furthermore, two NCS groups composed of the catalytic GH30 and the CBM35 moieties, respectively, were defined, and the refinement was accomplished applying medium NCS restraints. After iterative refinement and rebuilding of the linker regions, the final 2F o Ϫ F c map showed continuous density for the whole protein in chains A, B, and H, whereas some regions of the linker were not visible in chains C (amino acids 399 -400), D/G (amino acids 398 -400), and E/F (amino acids 397-401). At the later stages, water molecules were included in the model, which, combined with more rounds of restrained refinement, led to a final R factor of 17.9 (R free ϭ 20.4) for all data set up to 2.4 Å resolution. Refinement parameters are reported in Table 1.
The structure of Xyn30D-CBM35 was solved by molecular replacement using MOLREP (36) and the coordinates of the 403-530 portion of Xyn30D as the search model. Crystallographic refinement was performed using REFMAC (38) combined with model building with COOT (39) and addition of water molecules, which led to a final R factor of 20.9 (R free ϭ 26.9). The structure of Xybn30D-CBM35 complexed with GlcA and aldouronic acid was solved by difference Fourier synthesis using these refined coordinates. The ligands were manually built into the electron density maps and were refined similarly, to reach the R factors listed in Table 1.
Stereochemistry of the models was checked with PROCHECK (40) and MolProbity (41). The figures were generated with PyMOL (42). RMS deviation analysis was done using the program SUPERPOSE within the CCP4 package (33). Coordinates for all the structures have been deposited in the Protein Data Bank under accession numbers 4QAW, 4QB1, 4QB2, and 4QB6.
Polysaccharide Chain Model Building-A PDB model of a portion of the glucuronoarabinoxylan chain, with a typical substitution pattern (5) was constructed using the online biomolecule building program GLYCAM (43) and exported in its lowest energy state. The xylan chain was modeled into the Xyn30D-CBM35 binding cleft by superimposition of its GlcAxylose moiety onto the experimental position observed in the aldouronic acid soaked crystals. Then only small rotations of the glycoside bond were introduced manually in the two last xylose units at the reducing term of the xylooligosaccharide chain, to fit the ligand model to the Xyn30D-CBM35 binding cleft surface. The xylooligosacharide conformation obtained in this way keeps the reported typical xylan 3-fold helix pattern. Furthermore, the five central xylose units are in a conformation most similar to that experimentally observed in xylopentaose bound to Pseudomonas fluorescens xylanase A (PDB code 1EZN) (44).
Affinity Gel Electrophoresis-AGE was performed by following the method by Correia et al. (45) with some modifications. Continuous native polyacrylamide gels containing 6% acrylamide in 25 mM Tris, 200 mM glycine buffer (pH 8.3) were used. Soluble xylan (5-7 mg/ml) from beechwood (Roth), hardwoods (Sigma), or rye (Megazyme) was included in gels before polymerization. Gels with and without xylan were polymerized at the same time and were run in the same gel tank. Approximately 6 g of target protein was loaded in each well at room temperature, and gels were run at 30 mA/gel for 1 h. 15 mM EDTA, pH 8, was added to the samples when indicated. BSA was used as a negative noninteracting control.

RESULTS
Xyn30D Is a Bimodular Enzyme-We have purified and crystallized the glucuronoxylan-specific xylanase from P. barcinonensis (Xyn30D), as previously reported (31). We present here the three-dimensional structure of the full-length bimodular enzyme solved by molecular replacement at 2.4 Å resolution. Experimental and structure determination details are given under "Experimental Procedures" and in Table 1. An initial solution was obtained in the P3 2 2 1 space group containing four independent molecules in the asymmetric unit. Preliminary refinement with 4-fold NCS restraints allowed complete model building of the polypeptide chain, but full refinement was unfeasible until the space group was loosened to P3 2 , which allowed the final parameters to converge to proper values. Therefore, the final model contains eight molecules in the a.u. presenting strong binary NCS along the cell axes. This feature is due to small differences in the orientation between the catalytic and its appended CBM domain found among the different molecules in the crystal, as it will be shown below. Each chain (A-B-C-D-E-F-G-H) contains the protein after cleavage of the signal peptide and consists of 537 residues with a molecular mass of 65 kDa as calculated from its primary structure. The model contains seven residues of the C-terminal polyhistidine tag (Ala 531 -Ser 537 ). An almost contiguous electron density was observed in three molecules of the a.u., only a small fragment of the linker region show-ing poor or chopped density in the others, which precluded to fit a segment varying from one to five residues. Fig. 1 illustrates the secondary structure and the molecular shape of Xyn30D. The enzyme is ϳ95 Å long by 55 Å wide by 40 Å thick. The GH30 catalytic domain has a CBM35 domain appended to its C terminus folded in an independent manner and is located almost perpendicularly and at the opposite part of the active center (Fig. 1b). Following the general pattern found in GH30 enzymes, Xyn30D catalytic domain folds into two subdomains. The major domain is a (␣/␤) 8 barrel that starts at residue 11 and extends to residue 298. It has eight parallel ␤-strands that form the central barrel connected by eight external ␣-helices, the loops at the C-terminal end of the ␤-strands, L1-L8, contouring the catalytic pocket that is located in the center of the barrel. A tightly associated ␤-structure is fused to this barrel through a hydrophobic patch at ␣-helixes 7 and 8. This second subdomain presents a nine-stranded ␤-sandwich, with immunoglobulin-like fold, composed of residues 1-10, at the N terminus, and residues 299 -389, which is connected to the barrel through two segments with conserved sequence. This side ␤-structure is unique to the GH30 family and constitutes the main distinctive feature, as compared with the related GH5 enzymes (46).
The appended CBM35 noncatalytic domain (residues 402-530) displays a jellyroll type ␤-sandwich fold of two antiparallel sheets, where N is the redundancy for the hkl reflection.
where F c is the calculated and F o is the observed structure factor amplitude of reflection hkl for the working/free (5%) set, respectively.

JOURNAL OF BIOLOGICAL CHEMISTRY 31091
formed by four and five antiparallel ␤-strands, respectively. Two calcium ions are observed: one is common to many lectins and CBM families having a structural role (Ca2), whereas the other (Ca1) is particular to CBM35 family and participates in substrate binding. An extended 12-residue segment, which is almost entirely visible in some molecules within the a.u., links this CBM35 to the catalytic GH30 domain. Although this region presents a nonconserved sequence typical of high mobility regions (LSGGNSGGGNVN), the first part of the linker (Leu 390 -Ser 395 ) is fixed to the side ␤-structure by a net of polar interactions. As it is shown in Fig. 1d, Asn 322 and Asn 346 , from the second and fourth loop of the ␤-structure, respectively, are making several hydrogen bonds to the linker that fix this segment in a rather stable conformation conserved in all the eight independent molecules within the a.u. This feature possibly reduces motion to a short segment of the linker (Gly 396 -Asn 401 ), precluding the possibility of a large conformational repertoire. Consequently, only small differences in the orientation between both domains are observed in the crystal, going from 2°(molecules A versus C) to 5°(molecules A versus D) within the a.u. However, it must be taken into account that the linker segment Gly 392 -Gly 393 is also involved in intermolecular packing contacts as explained below, and therefore, the linker might be more loosen in absence of these crystal restrictions. Nevertheless, and apart from this slight flexibility between domains, no other significant differences among the eight independent molecules are observed in the crystal structure. Thus, superimposition of their catalytic domains gives a RMS deviation of 0.2-0.4 on 390 matched C␣ atoms, whereas superimposition of the CBM35 gives 0.2-0.3 on 127 matched C␣ atoms. Despite many attempts, no structure of full-length Xyn30D in complex with substrate or reaction products has been obtained. Co-crystallization experiments did not yield crystals, whereas ligand-soaked Xyn30D crystals resulted in very poor diffraction. An inspection of the crystals shows that the linker region Gly 392 -Gly 393 and also the side chain of Lys 532 at the beginning of the C-terminal polyhistidine flag are positioned in the substrate binding cleft of a contiguous molecule (Fig. 2). Consequently, diffusion of the ligands into the active site probably disrupts this interaction that must be important for stabilizing the intermolecular interface. In an attempt to overcome this problem, new constructions expressing the GH30 and the CBM35-independent domains were made, but only the CBM35 moiety provided suitable crystals for ligand soaking experiments.
The Catalytic Domain Resembles Known GH30 Xylanases-The three-dimensional structure has been reported for three other members of subgroup H within GH30 family, two glucuronoxylanases from B. subtilis, BsXynC (28), E. chrysanthemi EcXynA (47), and a glucuronoarabinoxylanase from C. papyrosolvens (CpC71; PDB code 4FMV). Similarly to Xyn30D, CpC71 has an attached noncatalytic CBM6 domain, but the reported crystal structure contains only the GH30 moiety.
Xyn30D shares the highest identity (80%) with BsXynC, whereas 57 and 40% identity are found with respect to CpC71 and EcXynA, respectively. However, the sequence conservation within the group is not equally distributed along the polypeptide chain, being significantly concentrated at the catalytic domain when compared with BsXynC (86%). Interestingly, CpC71 presents the highest identity with Xyn30D at the side ␤-structure subdomain (73%). This trend reflects a differential role of this subdomain among the different enzymes that, in turn, must be related to the domain composition presented by each enzyme within the GH30 family, as will be explained below.
Structural superimposition of Xyn30D with the structurally known GH30 enzymes, shown in Fig. 2a, provides some insight into the specificity displayed by Xyn30D. The C␣ atoms of BsXynC, CpC71, and EcXynA structures superimposed onto the Xyn30D catalytic domain with RMS deviations of 0.6, 1.3, and 1.2, respectively (based on the overlap of 337, 319, and 317 matched C␣ atoms). As can be seen in the figure, larger differences are observed at loops surrounding the axis of the barrel, especially L1, L2, L3, and L4, which must account for the different substrate specificities. Fig. 2 (be) displays the molecular surface of the different enzymes and the involvement of each loop in shaping their active site clefts. Consistent with their endo mode ␤-1,4-xylanase activity, all enzymes present an extended crevice able to allocate the polymeric chain of xylan. Furthermore, GH30 family differs to classical GH10 and GH11 xylanases in that they are able to degrade heavily substituted xylans, this feature being reflected in quite open, exposed to solvent, binding clefts. The crystal structure of ligand-bound BsXynC (Fig. 2c) and EcXynA (Fig. 2e) have been also reported (28,29). These two complex structures have shown how the MeGXn chain is attached to the active site cavity at the aglycon moiety of the substrate and allowed a detailed description of the different binding subsites. In particular, loops L1, L3, L6, and L8, present well conserved and exposed aromatic residues (Trp 26 , Trp 84 , Tyr 203 , Trp 267 , and Tyr 268 in Xyn30D numbering), which fit the xylan chain (Fig. 2a). Most key determinants responsible for substrate binding at subsites Ϫ3 to Ϫ1 are also conserved in the known GH30 xylanases belonging to subfamily H, as is shown in the structural alignment given in Fig. 3. The main difference is the lack in CpC71 of the residues involved in the methylglucuronic acid (MeGA) recognition described previously in glucuronoxylanases, especially Arg 271 and Tyr 273 from L8, both interacting with the C6 carboxylate, which are substituted by Trp 269 and Asn 271 in CpC71. This ionic interaction with the uronic acid has been described as the main specificity determinant distinctive of the mode of action of bacterial GH30 xylanases (28,29). CpC71 has been described as a glucuronoarabinoxylanase, but little has been reported on its particular functionality, which precludes the understanding of the structural basis of its dual substrate specificity. Apart from this feature, the CpC71 binding cleft topology is more similar to Xyn30D and BsXynC than that to that found in EcXynA, which is mainly due to the different conformation of loops L3 and L4. This is consistent to the two clusters described by phylogenetic analysis of the GH30 subgroup H enzymes corresponding to Gram-positive and Gram-negative bacteria (28).
Although soaking experiments of BsXynC and EcXynA crystals did not result in any ligand included at positively numbered subsites of the substrate, some unexpected molecules have been trapped in the crystals. Thus, a histidine, from the polyhistidine tag, and imidazol, from the buffer, occupy a position in the active site cleft of BsXynC and EcXynA that is similar to that of Lys 532 from the C terminus of a symmetry-related molecule found in Xyn30D crystals (Fig. 2). All these molecules are stack-

JOURNAL OF BIOLOGICAL CHEMISTRY 31093
ing to the conserved Tyr 203 , located at loop L6. There is a second aromatic residue Tyr/Trp at loop L4 (Fig. 3) that might be making additional stacking interaction to a putative xylopyranose ring occupying this subsite ϩ1. The different chemical nature of the moieties bound in the different crystals suggests that binding to this subsite might be mostly controlled by hydrophobic stacking interaction, which possibly tolerates allocating different decorations of xylose at this position.
From all the structural determinants responsible for substrate binding found in subgroup H, only Tyr 203 and Trp 267 from L6 and L8 are conserved among GH30 enzymes. These two residues are making the hydrophobic floor that shapes subsites Ϫ1 and ϩ1, within the active site cleft. This is not surprising taking into account that GH30 is a broad specificity and structurally diverse family, sharing less than 30% identity between its eight different subgroups. Moreover, GH30 sub- families can be grouped within two main subgroups showing significant divergences as, for example, topological differences at its fused ␤-structure (46). Interestingly, a DALI (48) structural alignment identifies closest relatives to Xyn30D that include enzymes from families GH39, GH44, GH51, and GH79. These Clan-A GH families show identity levels to Xyn30D between 14 and 16%, analogous to the value of 19% when compared with human glucosilceramidase (PDB code 2V3F), which belongs to GH30 subgroup A. All four related GH families present the same ␣/␤ϩ␤ fold and contain enzymes that hydrolyze polysaccharides of lignocellulosic biomass.
The Substrate Binding Ability of the Fused ␤-Domain Is Not a Common Trend across GH30 Enzymes-Although the dual domain fold has been suggested to be a requirement for evolved evolution within GH30 (46), a role in xylan binding has been reported for the ␤-side structure in BsXynC (28). This observation came from soaking experiments with MeGX2 that showed a binding site for the ligand and a MeGA-specific recognition motif. Thus, the reducing-terminal xylose was stacking to Trp 376 (BsXynC numbering), whereas Arg 353 interacted to the MeGA moiety attached to the adjacent xylose ring (Fig. 4). Furthermore, Leu 368 was making hydrophobic interaction with the methyl group, and this was proposed to discriminate MeGA from its unmethylated analogue. Although this specific binding motif is not conserved among subgroup H of GH30, putative MeGA binding has been also proposed for EcXynA (29). Thus, close to the site described in BsXynC, the EcXynA ␤-domain shows two candidates in the cleft that may keep the same function, i.e. an aromatic (Trp 401 ) and a basic residue (Lys 375 ) able to stabilize a xylose ring and the carboxylate of its attached MeGA. These residues are all located in strands ␤S6, ␤S7, and ␤S8, which together with the precedent ␤S5 are the less conserved region of fused ␤-domain (Fig. 3).
An inspection to this subdomain in the Xyn30D structure shows two arginines, Arg 339 and Arg 359 , that might be envisaged as potential candidates to target the MeGA carboxylate (Fig. 4a). Moreover, and close to Arg 359 , Leu 364 could be properly orientated to make a hydrophobic environment to a putative adjacent methyl group. However, none aromatic residue is found on the surface that could target any attached xylose moiety. Consequently, the substrate binding ability observed in BsXynC and stated for EcXynA might not be a common trend of family GH30. In relation to this, it is remarkable that closest homologues to the Xyn30D ␤-fused subdomain (above 70% identity) present an appended separated CBM domain. Consequently, the specific role of the ␤-side structure seems to be depending on the domain composition of each GH30 enzyme.
The Xyn30D-CBM35 Domain Binds Glucuronic Acid-Xyn30D harbors a family 35 CBM at its C-terminal end, which folds into a jellyroll type ␤-sandwich typical of other members of the CBM35 family. We have expressed a construction containing this domain, and the resulting purified protein has been crystallized and used for soaking experiments to investigate its ligand binding ability. Thus, several crystals were soaked with either GlcA, xylose, xylotriose, xylohexaose, or glucuronic aciddecorated xylooligosaccharides (aldouronic mixture). Only glucuronic acid-containing complexes were obtained. Fig. 5a shows the overall fold of the Xyn30D-CBM35 domain and the two observed GlcA binding sites. Site 1 (Fig. 5b) is calcium-dependent and has been observed in the crystals soaked with both GlcA and aldouronic mixture, whereas site 2 has been observed only in the GlcA-soaked crystal. There is a second calcium ion common to many lectins and CBM families having a structural role (Fig. 5d). This calcium is coordinated to Glu 18 , Glu 20 , and Asp 134 side chains and to the carbonyl group of Thr 38 , Gly 41 , and Asp 134 .
As can be observed in Fig. 5b showing binding site 1, the GlcA moiety is stacked to Trp 102 , located at the cleft defined by the loops linking the ␤-sheets at one side of the barrel and is recognized by Xyn30D-CBM35 through an intricate net of polar interactions, some of them mediated by the calcium ion (Fig. 6). First of all, the C6 carboxylate makes a bindentate interaction with Arg 79 side chain and is also interacting to Ca1, which in turn interacts also with the GlcA O4. The recognition of the GlcA carboxylate must be critical, because soaking of crystals in nondecorated xylooligosaccharides failed in providing observ-

Structure of Bimodular P. barcinonensis Xyn30D
NOVEMBER 7, 2014 • VOLUME 289 • NUMBER 45 able complexes. In addition, O1, O2, O3, O4, and O5 also make many polar interactions with the Xyn30D-CBM35 residues Glu 31 , Tyr 34 , Asn 44 , and Asn 132 , both directly and through several well ordered water molecules, which keeps the GlcA moiety in a very fix position common to the GlcA and aldouronic soaked crystals. On the contrary, the attached xylose unit does not make any direct interaction with residues from the Xyn30D-CBM35 domain, and, in agreement with this observa-tion, the electron density map shows weaker density at this position precluding building additional xylose units. Finally, the only significant difference observed in the ligand-bound complexes with respect to the unbound structure is a conformational change in the Glu 31 side chain that rearranges to make a hydrogen bond to GlcA O3 (Fig. 5b).
The second GlcA molecule is bound at the loops linking the ␤-sheets at the opposite side of the ␤-sandwich, next to the linker connecting the Xyn30D-CBM35 to the catalytic domain (Fig. 5b, Site2). The GlcA moiety, clearly seen in the electron density map, makes polar interactions to Val 138 main chain and to the side chain of Thr 15 , through its O2 and O3 hydroxyls. Moreover, the carboxylate at C6 is interacting to Thr 62 and Lys 66 . However, none of these residues are conserved among CBM35 family and, consequently, binding of GlcA at this site may not have biological significance.
Xyn30D-CBM35 Shows Distinctive Structural Features Regulating Its Specificity-The closest relatives to Xyn30D-CBM35 are the uronic acid-specific CBM35s, which bind ⌬4,5-anhydrogalacturonic acid released from pectin by the action of pectate lyases therefore being a signature of plant cell wall degradation (20). Thus, Xyn30D-CBM35 shows the highest sequence identity with the CBM35 moieties of the exo-␤-D-glucosaminidase from A. orientalis, Chi-CBM35 (41%), the rhamnogalauronan acetyl esterase from Clostridium thermocellum, Rhe-CBM35 (38%), and the xylanase Xyn10B from Cellvibrio japonicus, Xyl-CBM35 (36%), the three being able to bind also GlcA, although Rhe-CBM35 only weakly. The sequence identity is lower (23%) with the other characterized uronic acidspecific CBM35, a pectate lyase from an environmental isolate, Pel-CBM35, which does not show GlcA binding ability. Recognition of uronic acid by these CBM35 domains is always  dependent on calcium, which interacts with the carboxylate of the ligand. However, Xyn30D-CBM35 has a modified version of the (Asp/Asn)-(Tyr/Thr)-Xaa-Asn motif located at the end of ␤-strand 4 that is conserved in the previously reported uronic acid-CBMs and builds the metal site (Fig. 7, a and b). Thus, Asn 44 and Phe 45 bind the Ca ion through their side and main chain, respectively, in the same way observed previously (Asn 32 /Tyr 33 in Chi-CBM35), but the terminal Asn 35 is missing in the Xyn30D-CBM35 motif, and Glu 129 from the loop linking ␤10-␤11 is occupying an equivalent position to make a bidentate link to Ca1. This Glu 129 is a determinant of substrate specificity, as will be explained below. Topology seems to be rather conserved through the CBM35 family, as seen by the structural similarities revealed by the DALI server (48). Thus, the RMS values are between 1.0 and 1.8 for 122-127 aligned C␣ atoms of the up to now structurally characterized CBM35 members, given in Fig. 7b. Moreover, five regions (A-E) have been identified in the binding site of CBM35s (45) and the related CBM6s (49) as key features determining ligand binding and specificity. Region A and C, highly conserved, are constituted by a Trp, located at the end of ␤8 and making stacking interaction to the pyranose ring, and an Asn residue from ␤11 interacting with several of its OH groups. These regions are represented by Trp 102 (region A) and Asn 132 (region C) in the Xyn30D-CBM35 (Fig. 7a). Region B is highly variable depending on the subfamily, being the calcium site in the uronate CBM35 binders. Region D is defined by a position occupied by rather conserved polar residues located at ␤3, Glu 31 in Xyn30D-CBM35 that, as explained before, change its position upon GlcA binding. In the uronate-specific CBM35, this region is expanded by an additional and conserved His that interacts to the GlcA O2 hydroxyl, but this His is replaced by Tyr 34 in Xyn30D-CBM35 (Fig. 7, a and b). Finally, region E displays the largest degree of variation and has been stated to play an essential role in ligand specificity presented among CBM35s and CBM6s. This region, defined by the loop before ␤11, allocates aromatic residues in some of the domains (Fig.  7b); thus, a Tyr in Gal-CBM35 has been described to block off the binding site cleft conferring specificity for the terminal unit of complex carbohydrates, whereas two Trps in Man-CBM35 have been attributed a role extending the binding cleft to additional subsites able to bind mano-oligosaccharides (45). In contrast to the known uronate-binding CBM35 that does not exhibit particular features at this region, Xyn30-CBM35 contains Glu 129 , which is essential in calcium binding, as said before. Furthermore, and close to this position, Tyr 48 from loop ␤4-␤5 is protruding at the binding cleft pointing to a putative role in ligand recognition (Fig. 7, a and b). Consequently, a distinctive feature of Xyn30D-CBM35 with respect to the other uronate-binding CBM35s is the presence of two nonconserved aromatic residues delineating the binding pocket, i.e. Tyr 34 and Tyr 48 , which is an interesting trait considering the involvement of Tyrs in carbohydrate recognition that therefore might result in additional binding subsites. Fig. 7c displays the molecular envelop of Xyn35D-CBM35, together with that of Chi-CBM35, as a representative member of the uronate binding subgroup and the other structurally known CBM35 domains. As can be observed in the figure, the sequence variability among the family allows for a great structural diversity of the surface shape that must count for the diverse ligands recognized by each domain. Remarkably, Xyn30D-CBM35 shows a more open pocket, as compared with the uronate-binding Chi-CBM35. Also, it is unique in presenting a continuous stretch of negative charge distribution extending from its binding pocket, which could point to some sort of oriented interaction with its target substrate. A xylooligosacharide chain with a decoration pattern typical of glucuronoarabinoxylans found in grass cell walls (5) has been modeled as explained under "Experimental Procedures." This chain has been docked onto the active site surface, by superposition of its GlcA-xylose moiety onto the experimental position of this disaccharide found in the Xyn30D-CBM35 soaked crystals. As can be observed in Fig. 7c, the modeled conformation, which keeps the typical 3-fold shape of xylan chain, appears complementary to the binding site topology that, on the other hand, seems able to allocate also Ara and GlcA decorations at O2/O3 in all the xylose units. This surface topology would justify the previously reported binding properties of Xyn30D-CBM35 to different types of decorated xylans (27). Moreover, an inspection to the Xyn30D-CBM35 structure reveals that, apart from the two mentioned Tyr 34 and Tyr 48 , no other aromatic residues are evident at the binding site surface that might shape typical substrate binding sites. However, many polar residues might be making stabilizing interactions with the putative substrate (Fig.  7c). Consequently, although the Xyn30D-CBM35 domain is a GlcA-specific binder, other preferred interactions with its substrate increasing affinity cannot be excluded. To evaluate the importance of GlcA for interactions with substrate and their calcium dependence, the binding ability of Xyn30D-CBM35 in the presence of calcium chelating EDTA was analyzed by affinity gel electrophoresis. Migration of the purified module in gels containing soluble xylans was retarded when compared with the migration of the protein in gels without xylan (Fig. 8). Both glucuronoxylan, from beechwood and hardwoods, and glucuronoarabinoxylan from rye produced a retardation of the CBM35. However, retardation was significantly inhibited when EDTA was included in gels, clearly showing the involvement of the calcium-dependent GlcA binding, as indicated above. Notably, although retardation by glucuronoxylans was almost completely reverted by EDTA, retardation by rye glucuronoarabinoxylan was less affected by the chelating agent. This would indicate that in addition to the GlcA interaction, the Xyn30D-CBM35 might establish interaction to other components of xylan molecule, such as arabinose decorations of glucuronoarabinoxylan, as suggested.

DISCUSSION
The biological conversion of the polysaccharides within the plant cell wall to their constituent monosaccharides is performed by a large number of enzymes that are as diverse as their complex substrates. An example of this chemical complexity is provided by xylan, the major hemicellulosic component of the cell wall. The great majority of xylanases coping with this diversity belong to GH10 and GH11, a variety of debranching enzymes being associated to them to accomplish efficiently the degradation process. More recently, new endoxylanase activities have been identified in families GH5 and GH30 that, unlike classical xylanases, are able to function in high substituted xylans and, even more, they use decorations as specificity determinants. Xyn30D is one of the glucuronoxylan-specific endoxylanases included in GH30.
Within the complex landscape of plant cell wall degrading process, modularity is a common feature used by the implicated enzymes to fine-tune their biological functionality through attached CBMs. It is generally observed that the linker regions between catalytic and CBM display a great deal of structural flexibility to maximize substrate accessibility. According to this flexibility, only a few bacterial modular enzymes have been structurally characterized up to now. With a few exceptions (17,35), the isolated modules can generally function independently, and this is the case for Xyn30D (27). In despite of this, we have been able to crystallize the full-length enzyme, and this is the first intact structure that allows visualization of the juxtaposition of the CBM35 module relative to the GH30 catalytic domain.
A blast search gives ϳ100 entries showing more than 50% identity with Xyn30D sequence. Six of these sequences include a CBM35 domain attached to the catalytic GH30 domain and represent hydrolytic enzymes from different Paenibacillus and Aeromonas species. Alignment of the corresponding sequences reveals that they all display a short linker region of 6 -12 Gly/ Pro-rich residues (Fig. 9). Consequently, the overall molecular shape of Xyn30D here presented should be a good model for these enzymes. Moreover, 11 more sequences correspond to modular enzymes including, instead of CBM35, a topologically related CBM6 attached to the C terminus of the chain. These sequences correspond to different enzymes from Clostridium, all of them having also an additional C-terminal dockering domain. Although the sequence is less conserved in the linker region (Fig. 9), the similar size suggests that the arrangement of the catalytic and the CBM6 domains might be not too different to that observed in Xyn30D.
A different situation is represented by the arabinoxylan-specific xylanase Xyl5A from Clostridium themocellum (CtXyl5A), a bimodular xylanase with a domain structure GH5-CBM6 closely related to Xyn30D (17). Similarly to Xyn30D, CtXyl5A uses xylan decorations as determinants of specificity binding arabinose at its subsite Ϫ1 within the active site. Moreover, its ancillary CBM6 domain recognizes the termini of xyloand gluco-configured polymers and likewise Xyn30D-CBM35 homologues; it has been proposed to target the enzyme to regions of the plant cell wall that are being degraded (17). However, the superposition of Xyl5A to Xyn30D (Fig. 10) shows a very different molecular architecture in which the catalytic and the CBM domains are oppositely oriented through their distinct linker regions. This picture highlights the diverse structural scaffolds generated by the combination of, in principle, structurally related domains, which must result in a great functional plasticity. Furthermore, it emphasizes the importance of studying entire modular enzymes for a comprehensive understanding of the molecular machinery involved in the degradation of the complex plant cell wall.
Members of the genus Bacillus are common saprophytic components of soil microbiota. P. barcinonensis sp., isolated from rice fields, has been reported to be a high level producer of xylanases in media supplemented with rice straw, demonstrating the presence of a complex enzymatic system for xylan degradation. In particular Xyn30D, in agreement with other reported GH30 enzymes, shows high activity on glucuronoxylans but none on arabinoxylans. Recognition of a methylglucuronic substitution is required for hydrolysis of the xylan chain. On the other hand, its attached CBM35 specifically binds nonmethylated glucuronic acid decorations of different xylans in a calcium-dependent manner. The binding is more efficient with glucuronoxylans and glucuronoarabinoxylans, the last being probably the more common xylan type in the natural habitat of  Xyn30D. Our results suggest the ability of Xyn30D-CBM35 to accommodate substitutions in all the xylose units, which enables this module to bind highly decorated xylans. Furthermore, we found that, apart from its GlcA target, it might establish interaction to other components of xylan molecule, such as arabinose decorations of glucuronoarabinoxylan. The concerted action of the different domains is intriguing and is difficult to envisage the precise role that Xyn30D play in xylan degradation, but its peculiar structural features point to a role in depolymerization of highly substituted, chemically complex xylans. The methylated/unmethylated substrate specificity of its catalytic and ancillary domains probably reflects a sophisticated enzyme delivery strategy of Paenibacillus related to the complexity of its natural substrate.
More work has to be done to increase our knowledge on the molecular mechanisms regulating modularity. Apart from its biological interest, it will help in producing new, more efficient biocatalysts for the conversion of lignocelluloses to valueadded products and fuels. It will result in more environmentally sustainable industries.