Stability and Ligand Promiscuity of Type A Carbohydrate-binding Modules Are Illustrated by the Structure of Spirochaeta thermophila StCBM64C*

Deconstruction of cellulose, the most abundant plant cell wall polysaccharide, requires the cooperative activity of a large repertoire of microbial enzymes. Modular cellulases contain non-catalytic type A carbohydrate-binding modules (CBMs) that specifically bind to the crystalline regions of cellulose, thus promoting enzyme efficacy through proximity and targeting effects. Although type A CBMs play a critical role in cellulose recycling, their mechanism of action remains poorly understood. Here we produced a library of recombinant CBMs representative of the known diversity of type A modules. The binding properties of 40 CBMs, in fusion with an N-terminal GFP domain, revealed that type A CBMs possess the ability to recognize different crystalline forms of cellulose and chitin over a wide range of temperatures, pH levels, and ionic strengths. A Spirochaeta thermophila CBM64, in particular, displayed plasticity in its capacity to bind both crystalline and soluble carbohydrates under a wide range of extreme conditions. The structure of S. thermophila StCBM64C revealed an untwisted, flat, carbohydrate-binding interface comprising the side chains of four tryptophan residues in a co-planar linear arrangement. Significantly, two highly conserved asparagine side chains, each one located between two tryptophan residues, are critical to insoluble and soluble glucan recognition but not to bind xyloglucan. Thus, CBM64 compact structure and its extended and versatile ligand interacting platform illustrate how type A CBMs target their appended plant cell wall-degrading enzymes to a diversity of recalcitrant carbohydrates under a wide range of environmental conditions.

Plant cell walls are composed of structurally complex polysaccharides and comprise the most abundant source of organic carbon in the biosphere. Within plant cell walls, cellulose represents a primary target for enzymatic hydrolysis. Cellulose is synthesized as micrometer-long microfibrils consisting of parallel hydrogen-bonded ␤-1,4-glucan chains, containing both crystalline and less ordered regions. A wide range of microbial species and systems contribute to the degradation of plant cell walls through the production of an extensive repertoire of carbohydrate active enzymes (CAZymes). 4 CAZymes acting on recalcitrant carbohydrates are frequently modular proteins containing a catalytic module connected via flexible linker sequences to a variable number of non-catalytic carbohydratebinding modules (CBMs). CBMs potentiate the efficacy of the associated catalytic modules as they contribute to substrate targeting while promoting a close proximity between enzymes and structural polysaccharides (1)(2)(3)(4)(5). Thus, CAZymes are highly relevant biological players in the recycling of photosynthetically fixed carbon and have recently acquired industrial and environmental significance, in particular for the production of second generation lignocellulose-based biofuels (6,7).
Based on primary sequence similarities, CAZymes and CBMs have been classified in families in the constantly updated CAZy database (8,9). CBMs have been extensively characterized and are currently grouped into 80 sequence-based families in the CAZy database. Structure/function studies have revealed that the dominant fold assumed by CBMs is the ␤-sandwich. Based on the topology of their carbohydrate binding interfaces, CBMs have been classified into three types. Type A CBMs display a planar carbohydrate-binding interface that is adapted to bind the surface of crystalline polysaccharides. In contrast, type B proteins interact with internal regions of single glycan chains (endo-type) via protein surface clefts. Finally, type C modules recognize the termini of glycan chains (exo-type) through the fitting of a limited number of sugars in the binding pocket of the protein (5,10). The planar binding site of type A CBMs generally contains three hydrophobic residues that, based on sitedirected mutagenesis studies, are believed to interact with the crystalline surface of cellulose microfibrils (11)(12)(13). This class of CBMs includes members of CBM families 1, 2a, 3, 5, 10, 49, 63, 64, and 79, which bind to insoluble, highly crystalline, cellulose, and/or chitin. Thus, type A CBMs lack significant affinity for cellulo-oligosaccharides and chemically modified soluble forms of cellulose, whereas their recognition of crystalline ligands appears to be predominantly mediated by entropic forces (3,14). The capacity of CBMs to bind a diversity of carbohydrates with a high degree of specificity and selectivity has promoted the exploration of a large number of biotechnological applications (15,16). For example, fusion of CBMs with other proteins offers the possibility of targeted immobilization of antibodies (17)(18)(19), proteins (20), bacteriophages (21), and bacteria (22) onto cellulose matrices with the goal of developing sensor, microarray, and protein purification applications.
Type A CBMs are especially attractive tools for different biotechnological applications, because they selectively bind to crystalline carbohydrates present in versatile and abundant materials like paper, cotton, and nanocellulose. However, the use of these small molecules to target different peptides, proteins, enzymes, or antibodies into cellulosic supports is limited by our lack of knowledge of their biochemical properties. Here we generated and characterized a library of type A recombinant CBMs belonging to families 1, 2a, 3,5,9,10,49,63, and 64. Of the 40 type A CBMs characterized, a CBM64 member was shown to display a high capacity to bind cellulose under different and extreme physical-chemical conditions. The structure of StCBM64C from Spirochaeta thermophila and a detailed biochemical characterization of this family revealed why CBM64 displays such versatile biochemical properties.

Results and Discussion
Production of a Library of 96 Recombinant Type A GFP-CBMs-Type A CBMs are unique because they bind to the crystalline surfaces of cellulose and chitin through a planar carbohydrate-binding platform rich in aromatic amino acid residues that reflects the topology of the ligand surface. To characterize the capacity of type A CBMs to recognize crystalline carbohydrates under different physicochemical conditions, 96 bacterial and fungal CBMs from families 1, 2a, 3,5,9,10,49,63, and 64 were fused to the N terminus of the GFP and expressed in Escherichia coli. Although CBM9 are type C CBMs (5), one member of this family was included in the screen based on its capacity to bind crystalline cellulose. The proteins were purified through IMAC, and estimated molecular mass was confirmed through SDS-PAGE (supplemental Fig. S1). The data, presented in supplemental Table S1, revealed that 67 GFP-CBM fusions were expressed at high levels (Ͼ100 mg/liter of culture medium), with 8 of these 67 recombinant proteins being expressed at very high levels (Ͼ300 mg/liter of culture medium). In contrast, 12 proteins presented intermediate levels of expression (ranging from 50 -100 mg/liter of culture medium), whereas 17 were produced at levels below 50 mg/liter of culture medium mainly as a result of accumulation in the form of inclusion bodies (supplemental Table S1). There was no correlation between the levels of expression and the CBM origin. Thus, CBMs expressed with lower (Ͻ50 mg/liter) and higher (Ͼ300 mg/liter) efficacies originated both from bacterial and fungal hosts.
Binding Efficacy of GFP-CBMs to Carbohydrates-Binding efficacy of 40 GFP-CBM fusions that were expressed at higher levels and represented the widest biodiversity in terms of families was evaluated. Although the presence of an N-terminal GFP might affect the binding properties of the fused CBM, we have hypothesized that inclusion of linker sequences and the His 6 tag separating the two fusion partners would minimize any steric interference of the GFP in the function of the C-terminal CBM. To probe this hypothesis, initial qualitative experiments compared the capacity of selected GFP-CBM fusions and of GFP to bind different insoluble polysaccharides by fluorescence microscopy. The qualitative data, exemplified for CBM2 from Pyrococcus furiosus (CBM #29) in Fig. 1A, suggest that the CBMs are fully functional and bind effectively to microcrystalline cellulose, chitosan, and paper. Thus, the binding of the 40 GFP-CBM fusions to Sigmacell cellulose type 20, microcrystalline cellulose powder, low molecular weight chitosan, potato starch, and chromatographic paper (Whatman no.1) was subsequently studied by determining the binding isotherms with 100 mM phosphate buffer, pH 7, at 25°C. The equilibrium concentrations of GFP-CBM in the solid (pmol/mg) and liquid (pmol/ml) phases were determined on the basis of mass balance calculations performed with fluorescence data of solutions at loading and after 20-min incubations. As an example, Fig. 1B displays the isotherms representing the interaction of CBM64 from S. thermophila DSM 6192 (CBM #56), CBM2 from Ampullaria crossean (CBM #19), and CBM10 from Teredinibacter turnerae T7901 (CBM #41) to Sigmacell 20 cellulose, which are representative of high, medium, and low binding CBMs, respectively. The capacity of control GFP to interact with the polysaccharide was also evaluated (Fig. 1B). For the range of GFP-CBM loads tested (up to 20 pmol/mg of Sigmacell 20), maximum concentrations in the solid phase at equilibrium of 20, 13, and 5 pmol/mg were obtained from CBM #56, CBM #19, and CBM #41, respectively. For CBM #41, the binding to Sigmacell 20 is equivalent to the binding observed for the control GFP up to a concentration in the liquid phase of 300 pmol/ml. From here on, an increase in binding is detected. The amount of CBM #56, CBM #19, and CBM #41 in the solid phase at equilibrium corresponds to 95-99%, 63-89%, and 12-47% of the total amount of CBM in the solid/liquid system, respectively.
For each isotherm, the average percentage of the different amounts of GFP-CBM (10 -100 pmol) initially loaded that bound to each substrate at equilibrium was calculated. These results are plotted in Fig. 2 for the case of Sigmacell cellulose powder. The extremes of the thinner bars superimposed on each data point correspond to the minimum and maximum percentages of adsorption measured across the range of loaded amounts of GFP-CBM. An analysis of the data shows that of the 40 CBMs tested, 20 exhibit high binding to Sigmacell 20 (more than 75% adsorption at loadings up to 100 pmol/5 mg). On the other side of the spectrum, 11 CBMs bind poorly to Sigmacell 20 (less than 25% average adsorption). The majority of these are CBMs of fungal origin from families 1 (CBMs #43-#47), 9 (CBM #51), and 10 (CBMs #40 -#42). The largest variation in adsorption percentage across the range of loaded amounts is found for CBMs that display average adsorptions in the 25-50% range.
The binding specificity also varied considerably across the 40 CBMs studied when using the other four substrates (see sup-plemental Figs. S2-S5). For microcrystalline cellulose (supplemental Fig. S2), 10 CBMs displayed high binding (Ͼ75%) and 20 low binding (Ͻ25%). Again, the low binding CBMs are those from fungal origin of families 1 (CBMs #43-#47), 9 (CBM #51), and 10 (CBMs #40 -#42). The affinity of CBMs to chromatographic paper (supplemental Fig. S3) and chitosan (supplemental Fig. S4) was in general lower when compared with Sigmacell cellulose and microcrystalline cellulose. Adsorption percentages higher than 75% were observed only in one (CBM #56) and three (CBM #56, CBM #29, and CBM #33) cases for paper and  chitosan, respectively. The low performance of fungal CBMs was again the norm. The binding of CBMs to starch was in general very poor (Ͻ25%; supplemental Fig. S5). A comparison of the results obtained for the CBMs that displayed the highest binding (Ͼ75%) to Sigmacell, microcrystalline cellulose, paper, and chitosan is shown in Fig. 3. Here, CBM #56 (a CBM64 from S. thermophila) shows up as a high performer in all four substrates. A set of two CBMs also perform very well in three substrates other than paper: CBM #29 (a CBM2 from P. furiosus) and CBM #33 (a CBM2 from Thermobifida fusca YX). CBM #34 (a CBM2 from T. fusca YX), CBM #37 (a CBM2 from T. fusca YX), and CBM #58 (a CBM3 from Clostridium thermocellum) presented also very good binding both to SigmaCell and microcrystalline cellulose. The binding of the control GFP to the different substrates was always lower than 13% (supplemental Figs. S2-S5). The capacities of the CBMs to interact with their target ligands under different conditions were evaluated next for a diversity of temperatures, pH levels and ionic strengths and are as follows.
Effect of Temperature-The effect of temperature on the binding of the 40 GFP-CBM fusions to Sigmacell 20 was studied in the 4 -90°C range. The experiments were performed by contacting 50 pmol of GFP-CBM with 5 mg of Sigmacell cell in 100 mM phosphate pH 7 buffer. The percentage of GFP-CBM adsorbed at equilibrium is shown in Fig. 4A for the two temperature extremes evaluated, 4 and 90°C, and in supplemental Fig.  S6 for the other temperatures (25, 35, 50, and 70°C). An observation of Fig. 4A shows that seven CBMs perform very well at the temperature extremes (Ͼ75% binding) and across the temperature range studied (supplemental Fig. S6). Among these, we can again identify CBMs #33, #34, #37, #56, and #58, which had previously been shown to bind very well to Sigmacell, microcrystalline cellulose, paper, and chitosan at 25°C (Figs. 3 and supplemental Figs. S2-S5). These results are consistent with the fact that these CBMs are derived from the thermophilic bacteria T. fusca YX (CBMs #33, #34, and #37) and C. thermocellum (CBMs #56 and #58). The largest differences in adsorption (Ͼ25%) across the temperature range studied were obtained for CBMs #3, #7, #10, and #60, which bound more effectively at 9°C than at 4°C, and for CBMs #1 and #2, which bound more effectively at 4°C than at 90°C (Fig. 4A).
Effect of pH-The effect of pH on the binding of the 40 GFP-CBM fusions to Sigmacell 20 was investigated. The experiments were performed by contacting 100 pmol of GFP-CBM with 5 mg of Sigmacell cell in 100 mM phosphate at 25°C across the 5.0 -9.5 pH range. The percentage of GFP-CBM adsorbed at equilibrium is shown in Fig. 4B for the two pH extremes evaluated and in supplemental Fig. S7 for the other pH values. Fig. 4B shows that four CBMs-#34, #37, #56, and #58 -perform very well at the pH extremes (Ͼ75% binding) and across the pH range studied (supplemental Fig. S7). In general, the analyzed CBMs performed better at acidic pH levels, with 18 displaying more than 75% binding to Sigmacell (see Fig. 7). Notable exceptions are CBM #6, a CBM3 from Caldicellulosiruptor bescii, and CBM #7, a CBM3 from Clostridium cellulolyticum, which bind poorly to Sigmacell at pH 5 but perform very well at pH 9.5.
Effect of Ionic Strength-The effect of ionic strength on the binding of the 40 GFP-CBM fusions to Sigmacell 20 was studied in the 100 mM to 1.5 M range. The experiments were performed by contacting 50 pmol of GFP-CBM with 5 mg of Sigmacell cell in phosphate buffer at the test ionic strength, pH 7 at 25°C. The percentage of GFP-CBM adsorbed at equilibrium is shown in Fig. 4C for 100 mM and 1 M and in supplemental Fig. S8 for the other ionic strengths. In general, the performance of the analyzed CBMs varied little with the ionic strength (Fig. 4C). The largest differences in adsorption (Ͼ25%) were obtained for CBMs #2, #3, #5, #6, and #38, which bound more effectively at 0.1 M than at 1 M. Among the best performers across the ionic strength range tested, we can again identify CBMs #33, #34, #37, #56, and #58.
Crystal Structure of StCBM64C-The data presented above revealed that CBM #56 has a high capacity to bind a diversity of insoluble polysaccharides under a variety of physical-chemical conditions. CBM #56 is the C-terminal family 64 CBM from a putative GH10 xylanase (ADN02703) of S. thermophila DSM 6192 and was termed StCBM64B. CBM64s were initially identified at the C terminus of seven S. thermophila CAZymes, which have single GH5, GH9, GH10, or GH12 catalytic domains (38). Because attempts to crystalize StCBM64B failed, the structure of StCBM64C was solved to a resolution of 1.5 Å, as described under "Materials and Methods." StCBM64C has 82% identity and 86% similarity to StCBM64B and is the C-terminal domain of the GH12 cellulase (AEJ60658/ADN02189) of S. thermophila DSM 6578. Data collection and refinement statistics for this final structure (deposited under PDB access code 5LU3) are presented in Table 1, as well as the data collection statistics of an isomorphous crystal of StCBM64C used for de novo S-SAD phasing. During the course of this project, the structure of a second member of CBM64, the C-terminal domain of S. thermophila DSM 6192 GH5 (ADN02996), here termed StCBM64A, became known (PDB code 5E9O) (39).
StCBM64C adopts the classic ␤-sandwich jelly roll fold containing nine ␤-strands typical of the majority of CBM families (Fig. 5A). The order of the ␤-strands in ␤-sheet 1 is ␤1, ␤3, ␤8, ␤5, and ␤6, and the order of the ␤-strands in ␤-sheet 2 is ␤2, ␤9, ␤4, and ␤7. In the two ␤-sheets, the ␤-strands are anti-parallel with the exception of the small ␤-strand ␤1 in ␤-sheet 1, which runs parallel to ␤3. The ␤-strands are connected by short loops, which may reflect the enhanced stability presented by StCBM64C, although there is a small helix extending from residues Gln-57 to Val-60. The core of the ␤-sandwich is highly hydrophobic, with very few polar residues, and includes two phenylalanine, two tryptophan, two isoleucine, and two leucine residues. Both ␤-sheets present a flat, untwisted, surface typical of type A CBMs. Three-dimensional structural comparison using the SSM site revealed that the closest structural homologue of StCBM64C is StCBM64A from S. thermophila (PDB code 5E9O), with a Z score of 9.7, root mean square deviation of 0.55 Å over 84 aligned residues. An overlay of the two structures (Fig. 5B) emphasizes the high structural similarity between the two CBM64 members that have 78% primary sequence identity. The unique co-planar arrangement of the four exposed tryptophan side chains of StCBM64A is highly conserved in StCBM64C only changing the conformation of Trp-78, which can be explained by differences inherent to the crystal-packing environment. As observed for StCBM64A, the surface electro-  static potential of StCBM64C reveals an overall slightly acidic nature with the surface of the four exposed tryptophan side chains close to neutral. StCBM64C has a lower degree of homology with other functionally relevant CBMs with a ␤-sandwich fold, in particular with the CBM2s of Cellulomonas fimi (PDB codes 1HEJ and 1E5B), with a Z score of 3.2-4.1 and root mean square deviation of ϳ3.3 Å over 68 aligned residues. In general, type A CBMs have a ␤-sandwich fold similar to StCBM64C, and the carbohydrate binding interface has a flat surface that reflects the topology of the crystalline ligand (11)(12)(13). Usually, three aromatic residues, predominantly tryptophan amino acids, on the surface of type A CBMs participate in polysaccharide recognition (5). Here, however, the crystal structure of StCBM64C suggests that four residues, Trp-31, Trp-38, Trp-54, and Trp-78, which are linearly aligned at the surface of ␤-sheet 2, may constitute the ligand binding site (Fig.  5A). The fact that four instead of three tryptophan side chains, which form tighter sugar interactions than the tyrosine side chains identified in several type A CBMs, were recognized in the putative ligand interacting surface may explain the high binding efficacy revealed by CBM64. Arg-53, Glu-82, and Glu-86 are also located at the surface of the protein in the vicinity of the aromatic residues and thus may also play a role in ligand recognition. Strikingly, the later aromatic side chains alternate with the side chains of polar uncharged residues, namely Asn-39 (between Trp-78 and Trp-38), Asn-36 (between Trp-38 and Trp-54), and Gln-59 (between Trp-54 and Trp-31), as also observed in StCBM64A structure (39). These side chains could act both as hydrogen bond donors and acceptors, thus extending the potential number of contacts with the carbohydrate and the hydrophilicity of the interaction. The distances between Trp-31, Trp-54, Trp-38, and Trp-78 suggest that each one of the aromatic side chains will stack against one sugar ring, whereas the side chains of Asn-39, Asn-36, and Gln-59 would participate in the recognition of three intermediate glucose OH groups (Fig. 5). These observations suggest that CBM64 possesses a large carbohydrate binding platform, comprising an unprecedented large number of residues, organized in a unique structural arrangement.
Recently Georgelis et al. (37) reported the structure of EXLX1 CBM63 in complex with cellohexaose. Although the CBM, because it is characteristic of type A CBMs, does not interact with cellohexaose, the oligosaccharide was sandwiched between two CBM63s arranged in opposite polarity (37). The structure of cellohexaose bound to EXLX1 displays a conformation that is intermediate between the flat chains of cellulose and the twisted chains observed in oligosaccharides captured in complex with type B CBMs. When the structures of EXLX1 and StCBM64C are superposed, it is clear that although the two CBMs do not share a similar three-dimensional topology, the hydrophobic side chains of the aromatic residues of the two carbohydrate binding platforms line up in similar positions (Fig. 6). These observations enabled using CBM63 complexed structure to model a cellohexaose molecule into the putative StCBM64C carbohydrate-binding surface (Fig. 6A). The model (Fig. 6A) suggests that the pyranose rings of G2, G4, and G6 establish hydrophobic stacking interactions with the side chains of Trp-54, Trp-38, and Trp-78, respectively (numbering of the glucose residues starts at the reducing end of cellohexaose). In addition, Asn-39 and Asn-36 are within hydrogen bonding distance to C6 hydroxyl groups of G3 and G5. Gln-59 is positioned further away from C6 of G1, although it is likely that in a twisted cellohexaose conformation, it could also make polar interactions with the carbohydrate backbone. These observations suggest that, in contrast to other type A CBMs, StCBM64C can establish a hydrogen bond network with the crystalline ligand, which should be dominated by the side chains of Asn-39 and Asn-36 (see below). This analysis also suggests that StCBM64C contains an extended carbohydratebinding interface that may comprise up to seven subsites. Thus, each one of the four aromatic side chains and the three polar residues that interdigitate the four tryptophans should bind to individual sugars of the polysaccharide chain, allowing for both hydrophobic and hydrophilic protein-carbohydrate interactions (Fig. 6A).
Structural Basis for Ligand Recognition by CBM64 -Quantitative isothermal titration calorimetry (ITC) data, presented in Table 2 and exemplified in Fig. 7 showed that both StCBM64B and StCBM64C bound to regenerated (non-crystalline) insoluble cellulose (RC) very tightly, with a K A of ϳ1.5-2.0 ϫ 10 6 M Ϫ1 . In addition, the two CBMs were unable to recognize cellohexaose, a property widely reported for type A CBMs and that result from the inability of the flat carbohydrate-binding platform to accommodate soluble, highly twisted oligosaccharides. The interaction of StCBM64B and StCBM64C with RC is enthalpically driven, although entropy makes a significant favorable contribution to ligand binding. Although type A CBMs bind regenerated cellulose through a predominantly endothermic process, which is driven by entropic forces, type B CBMs interact with soluble carbohydrates enthalpically. Thus, the thermodynamics of the binding of StCBM64B and   Table 2 and exemplified in Fig. 7, confirmed that StCBM64B and StCBM64C bind to xyloglucan, hydroxyethylcellulose (HEC), glucomannan, and ␤-glucan. The two CBM64 members bound xyloglucan slightly less tightly than RC (Table 2). In addition, StCBM64B and StCBM64C bound HEC and glucomannan with a 20-fold lower affinity than xyloglucan and ␤-glucan more weakly (K A of ϳ5 ϫ 10 3 M Ϫ1 ). The capacity of typical type A CBMs, such as CBM2, CBM3 (40), and CBM64 (this study), to bind primarily to insoluble cellulose and, with a lower degree of affinity, also to soluble glucan polysaccharides suggests that the targeting role of type A CBMs is broader than previously anticipated. Binding to soluble cellulose may allow appended enzymes to initiate the attack on the plant cell wall by targeting first the exposed glucans and hemicelluloses, and once this activity uncovers the insoluble cellulose, the primary ligand for type A CBMs, the enzymes will target the crystalline regions of the polysaccharide. The location of CBM64 carbohydrate interacting platform was probed by analyzing the binding efficacy of different mutant derivatives of StCBM64B and StCBM64C. Initial pulldown qualitative assays performed with StCBM64C using Avicel as the insoluble ligand (supplemental Fig. S10) suggested that alanine substitution of any of the four tryptophan residues identified at the surface of ␤-sheet 2 had no effect on binding (supplemental Fig. S10). Thus, the interaction of StCBM64B and StCBM64C with RC was probed by ITC. StCBM64C W31A, W38A, and W54A derivatives displayed a moderate reduction in affinity to the insoluble ligand (Table 2 and Fig. 8). Complete abolition of RC recognition was only observed when double or triple tryptophan mutant derivatives were produced ( Table 2). In contrast, single tryptophan alanine substitutions were sufficient to abolish xyloglucan recognition, which may be accounted for by the lower affinity observed against this polysaccharide. In addition, it is possible that the four aromatic side chains could straddle both the glucose and xylose units of xyloglucan, contributing to their exclusive binding importance. Strikingly, when Asn-36 and Asn-39 were singly substituted by alanine, the two CBM64 mutant derivatives lost their capacity to bind RC ( Table 2), suggesting that the side chains of these residues hydrogen bond the C6 hydroxyls of glucose molecules as suggested by the model described above (Fig. 6A). In contrast, the side chain of Gln-59 does not seem to be relevant for hydrogen bonding the polysaccharide chains ( Table 2). The critical contribution of the polar interactions for the recognition of insoluble ligands by CBM64 may result from their role in directing the CBM into a correct positioning in relation to the polysaccharide chain. Strikingly, N36A and N39A mutant derivatives display no diminished capacity to bind xyloglucan. Because xyloglucan is a decorated polysaccharide, with glucose residues substituted at C6 with xylose, there is no longer an hydrogen bond partner at this position of the polysaccharide. It is possible, however, that other residues of StCBM64C may bind the decorations of xyloglucan, although mutation of Arg-53 or Glu-84 had no effect in glycan recognition (supplemental Fig. S9). Consistently, Asn-36 and Asn-39, but particularly Asn-36, are critical for the recognition of the soluble glucans that act as ligands for CBM64 (supplemental Fig. S9). Thus, taken together, these data suggest that Asn-36 and Asn-39 play a critical role for the binding of non-decorated cellulose, both in the soluble and insoluble forms, whereas hydrophobic interactions dominate xyloglucan recognition. The four tryptophan residues and the two asparagine that comprise the carbohydrate binding platform of StCBM64C are completely conserved within CBM64, suggesting a common mechanism of ligand recognition in this CBM family. Conclusions-This report reveals that a wide range of type A CBMs have the capacity to recognize crystalline polysaccharides under a wide range of physiological conditions. The robust properties revealed by type A CBMs may have evolved as an adaptation to function under ecological niches imposing significant selective pressures, such as the mammal's gastrointestinal tract or the soil. From the CBMs of the various families analyzed, CBM64 displays a wider capacity to function under extreme conditions. The crystal structure of StCBM64C from S. thermophila combined with the site-directed mutagenesis experiments show that the flat surface identified in the protein structure constitutes the ligand-binding site and that both the side chains of Trp and Asn residues play a central role in carbohydrate recognition. The side chains of the four aromatic residues are co-planar and form an extensive hydrophobic platform that recognizes the predominantly apolar regions of the crystalline ligand. In addition, binding is assisted by polar interactions, particularly by Asn-36 and Asn-39. CBM64 has probably the most extensive binding platform described so far for a type A CBM, which is able to interact not only with crystalline carbohydrates, but also with diverse soluble polysaccharides, such as xyloglucan, glucomannan, ␤-1,4-glucan, and ␤-1,3-1,4glucan. The structure and function of CBM64 illustrate how type A CBMs have evolved to increased ligand plasticity and biochemical stability, allowing potentiating plant cell wall degradation by complex cellulolytic systems.

Materials and Methods
Isolation of Microbial Genes through PCR and Gene Synthesis-A total of 96 genes encoding type A CBMs from different families were isolated either through PCR or gene synthesis. A high throughput pipeline was used for the efficient PCR assembly and cloning of the 96 constructs. The genes encoding 40 bacterial CBMs for which there was access to gDNA were isolated through PCR using NZYProof DNA polymerase (NZYTech Ltd., Lisboa, Portugal). Gene synthesis was used to produce 56 genes for which there was no access to a gDNA template. Artificial genes were designed with a codon usage table optimized for expression in E. coli and synthesized using standard procedures (23). Purified nucleic acids (ϳ50 ng) were subsequently cloned into pHTP9 E. coli expression vector using the NZYEasy cloning kit (NZYTech Ltd.), according to established protocols for the ligation independent cloning technology. In pHTP9, genes were expressed under the control of a T7 promoter. Encoded recombinant CBMs were fused to an N-terminal GFP and include an internal His 6 tag for purification through IMAC. The 96 resulting recombinant plasmids were termed pHTP9_1 to pHTP9_96 and were sequenced to ensure that no mutations accumulated during gene isolation/synthesis and cloning. Recombinant fusion GFP-CBMs were termed CBM #1-CBM #96. The primary sequences of the 96 CBMs, their respective origins, and their families are presented in supplemental Table S1. For protein crystallography, a CBM #56 derivative lacking the GFP partner was obtained by isolating the respective gene from pHTP9_56 and cloning into pET21a, generating pVp1 (Novagen). The recombinant CBM #56 derivative encoded by the pET21a vector was termed StCBM64B. An homologue of StCBM64B, termed StCBM64C, was also recombinantly produced by synthesizing the artificial nucleic acid as described above and cloning the respective gene into pET21a, generating pVp2. StCBM64B and StCBM64C, encoded by pET21 derivatives pVp1 and Vp2, contain C-terminal His 6 tags to allow purification through IMAC. The recombinant sequences of StCBM64B and StCBM64C are both displayed in supplemental Table S1.
Site-directed Mutagenesis-StCBM64B and StCBM64C mutant derivatives were generated by site-directed mutagenesis using plasmids pVp1 and Vp2 as templates. The primers used to generate these mutants are shown in supplemental Table S2. The generated nucleic acids were sequenced to ensure that only the appropriate mutations had been incorporated in the nucleic acids.
Expression of Recombinant CBMs in E. coli-The 96 pHTP9 plasmid derivatives were used to transform E. coli BL21 (DE3) cells. Recombinant E. coli cells were grown in NZY autoinduction LB medium (NZYTech Ltd.) supplemented with kanamycin (50 g/ml) at 37°C to early exponential phase (A 600 nm ϭ 1.5-2.0), and recombinant protein production occurred following a further incubation at 25°C for 16 h. Recombinant E. coli cells were harvested by centrifugation at 1500 g at 4°C for 15 min and lysed in NZY Bacterial Cell Lysis Buffer (NZYTech Ltd.). The His 6 -tagged recombinant GFP-CBMs were purified from cell-free extracts through IMAC in a high throughput pipeline as described previously (24). Recombinant proteins were eluted in 50 mM NaHepes, pH 7.5, 500 mM NaCl, and 300 mM imidazole. Homogeneity of purified proteins and molecular mass of recombinant CBMs were assessed by SDS-PAGE in 14% (w/v) acrylamide gels (supplemental Fig. S1). Protein concentration of GFP-CBM stock solutions varied between 0.5 and 3 mg/ml, as determined spectrophotometrically. For crystallographic studies, StCBM64B and StCBM64C were expressed as described above (except for the LB medium containing 100 mg/ml ampicillin that was used), purified through IMAC, and further purified by size exclusion chromatography. Following IMAC, fractions containing the purified proteins were bufferexchanged, using PD-10 Sephadex G-25 M gel filtration columns (GE Healthcare), into 50 mM NaHepes buffer, pH 7.5, containing 200 mM NaCl and 5 mM CaCl 2 and were then subjected to gel filtration using a HiLoad 16/60 Superdex75 column (GE Healthcare) at a flow rate of 1 ml/min. Purified StCBM64B and StCBM64C were concentrated using an Amicon 10-kDa molecular mass centrifugal concentrator and washed three times with 1 mM CaCl 2 . StCBM64B and StCBM64C purity was assessed by SDS/PAGE, and protein concentration was determined spectrophotometrically.
Biochemical Assays Using GFP-CBM Proteins-The interactions of GFP-CBM fusions with Sigmacell cellulose type 20, microcrystalline cellulose powder, low molecular weight chitosan, potato starch, and Whatman no. 1 chromatography paper were determined experimentally following a protocol directed to detect the fluorescence of GFP. First, solutions of GFP-CBM with concentrations between 0.06 and 0.6 M were prepared by diluting the corresponding stock solutions with the binding buffers under study. Most experiments were performed using 100 mM phosphate buffer at pH 7. Citrate and carbonate buffers at 100 mM were used to cover the pH ranges 5-6 and 9.5-10, respectively. The effect of ionic strength was studied by varying the concentration of phosphate buffer in the 100 -1500 mM range. In preparation for the binding experiments, 5 mg of the substrate under study were loaded onto the wells of MultiScreen-HV filter plates with 0.45-m PVDF membranes (Merck-Millipore) and washed/equilibrated for 15 min with 3 ϫ 200 l of the relevant buffer. A MultiScreenHTS vacuum manifold (Merck-Millipore) was used to filter and discard spent buffer between washing steps. Then 200 l of solutions containing 10, 20, 40, 60, 80, or 100 pmol of the GFP-CBM under study were loaded into each well and incubated for 20 min at the desired temperature (4 -90°C) in a well plate agitator/incubator. After incubation, the supernatants were directly transferred to the 96 wells of a white microplate by filtration with the vacuum manifold. The fluorescence of the recovered filtrates was measured using a Cary Eclipse fluorescence spectrophotometer (Varian) with excitation and emission set at 490 and 510 nm, respectively. The fluorescence of the solutions prior to loading was also measured in a microplate. Appropriate dilutions were used to ensure that measurements were performed in a region of linearity between fluorescence and concentration. Mass balance calculations were then performed to determine the equilibrium concentrations of GFP-CBM in the solid (pmol/mg) and liquid (pmol/ml) phases. Each binding condition was tested in triplicate. The binding of the GFP-CBM fusions was also analyzed qualitatively by examining the solid matrices recovered after the incubation procedure described above by fluorescence microscopy (Leica DMLB; Leica Microsystems). Control experiments were performed using GFP.
Affinity Gel Electrophoresis-Affinity gel electrophoresis was used to evaluate the capacity of StCBM64B and StCBM64C and their respective mutant derivatives to interact with soluble polysaccharides. The method used was essentially as described (26), using the polysaccharide ligands at a concentration of 0.25% (w/v). The nonbinding negative control protein was BSA.
Binding to Insoluble Polysaccharide-Qualitative assessment of the affinity of StCBM64B and StCBM64C and their mutant derivatives to insoluble cellulose (Avicel) was carried out as follows: 150 l of ϳ800 g ml Ϫ1 solution of the recombinant CBMs in 50 mM NaHepes, pH 7.5, containing 100 mM NaCl, 5 mM CaCl 2 , and 0.05% (v/v) Tween 20 (Buffer A) was mixed with 12 mg of insoluble polysaccharide (Avicel). The reaction mixture was incubated for 1 h at room temperature with gentle shaking, after which time the insoluble ligand was collected by centrifugation at 13,000 ϫ g for 10 min. The supernatant, comprising the unbound fraction, was removed, and the pellet was washed five times with 150 l of buffer A. Bound and unbound fractions were analyzed by SDS-PAGE using a 16% acrylamide gel. Controls containing protein but no Avicel were included in parallel to ensure that no precipitation occurred during the assay period.
Isothermal Titration Calorimetry-ITC was performed essentially as described previously (27), using a Microcal VP-ITC calorimeter (Northampton, MA) at 25°C. Before the experiment, purified proteins were buffer-exchanged against 50 mM phosphate buffer, pH 7.0, containing 0.1 mM CaCl 2 . The reaction cell contained protein at 30 M, whereas the syringe contained either the oligosaccharides at 0.5-10 mM or the soluble polysaccharides at 1-6 mg/ml. The ligands were dissolved in the dialysis buffer (separately) to minimize heats of dilution. Titrations were performed by a first injection of 2 l followed by 28 subsequent injections of 10-l aliquots of either polysaccharide or oligosaccharide at 220-s intervals into ITC sample cell (volume, 1.4467 ml) containing different enzyme samples. The stirring speed and reference power were set at 307 rpm and 15 cal/s, respectively. The heat background was measured under the same conditions by dropping the buffer only without ligand into the protein at the same concentration as in the cell. The molar concentration of CBM binding sites present in polysaccharide ligands was determined as described previously (28). Data analysis was performed by non-linear regression using a single binding model (Microcal Origin 7.0 software), and thermodynamic parameters, such as the association constant (K a ), number of binding sites in the protein (n), and the binding enthalpy change (⌬H) were determined. Gibbs free energy change (⌬G) and the entropy change (⌬S) were calculated according to the following equations: ϪRTlnK a ϭ ⌬G ϭ ⌬H Ϫ T⌬S, where R is the gas constant, and T represents the absolute temperature (Luís et al., (42)). For experiments with RC, the ligand in the cell was at 3.6 mg/ml, and the protein (250 -500 M) but not the polysaccharide was injected.
Crystallization, Data Collection, Structure Determination, and Refinement-Crystallization experiments were performed using the hanging drop vapor diffusion method, and drops were prepared at 293.15 K by adding an equal volume (1 l) of protein (35 mg/ml) and reservoir solution [1% (w/v) Jeffamine ED-2001, 0.1 M HEPES, pH 7.5, and 1 M succinic acid, pH 7.0). StCBM64C crystals were harvested from the crystallization drop and transferred into a cryo-stabilization solution mimicking the mother liquor supplemented with 30% (w/v) glycerol. Structure determination was done by the SAD method, using the anomalous diffraction of the sulfur atoms. The SAD experiment was conducted on Beamline PXIII (X06DA) of the Swiss Light Source (Paul-Scherrer Institut, Villigen, Switzerland). The crystals diffracted up to 1.5 Å resolution and belonged to P4 3 2 1 2 space group. Complete native and sulfur-SAD data sets were collected and processed with XDS (29) and scaled with AIMLESS (30). The data collection and processing statistics are presented in Table 1.
De novo phasing was achieved by sulfur-SAD using synchrotron radiation with a wavelength of 1.8 Å on a single crystal using an inverse beam strategy with wedges of 10 degrees. At this wavelength, the crystal diffracted up to 2.3 Å resolution. Structure determination of StCBM64C was performed using AutoSol (31) from PHENIX (32). The SAD phasing, auto-building, and refinement using AutoSol found 2 sulfur sites and built a partial model with 83 residues and a final R work and R free of 0.209 and 0.225, respectively. This partial model was used as phase estimates for a higher (1.5 Å) resolution data set collected from a second isomorphous crystal at the Swiss Light Source, using radiation with 1.0 Å wavelength. The data collection and processing statistics for this second data set are also presented in Table 1. Restrained refinement was performed with PHENIX (32) using this higher resolution data set. Inspection of the electron density maps was carried out using COOT (33). In the final stages of refinement, the R work and R free converged to 11.9 and 14.5%. Geometrical validation and model improvement was carried out using PDB_REDO (34) and several validation programs such as PROCHECK (35) and MOLPROBITY (36). Analysis of the model showed that 98.9% of the protein residues are in the most favored or additionally allowed regions of the Ramachandran plot. Refinement statistics are summarized in Table 1. Coordinates and observed structure factor amplitudes have been deposited in the Protein Data Bank under the accession code 5LU3.
Modeling of the Cellohexaose from BsCBM63 onto the StCBM64C-Structure of a type A CBM from EXLX1, a bacterial expansin from Bacillus subtilis, with bound cellohexaose (BsCBM63, PDB code 4FER (37)) was used to analyze the potential putative binding site of StCBM64C. BsCBM63 was shown to bind carbohydrates through hydrophobic interactions of three co-planar linearly arranged aromatic residues: Trp-125, Trp-126, and Tyr-157. Because there is little sequence or structural homology between the two structures, the three co-linear aromatics from BsCBM63 were manually superposed on the four co-linear aromatics in StCBM64C: Trp-31, Trp-38, Trp-54, and Trp-78 based on their mass-weighted ring centers using CHIMERA (41). The aromatic rings of Trp-125, Trp-126, and Tyr-157 in BsCBM63 are nearly parallel to the planes of the pyranose rings of cellohexaose (CE6), which adopts the undistorted 4 C 1 chair conformation in an almost linear glycan chain. To optimize the interaction of the CE6 to StCBM64C, the CE6 position was tweaked to keep the distance between the aromatic and the pyranose rings between 3.0 and 4.5 Å (to account for CHinteractions) and maximize the polar interactions between the CE6 and the Asn-36, Asn-39, and Gln-59. These residues are found on one side of the putative binding site between the co-linear tryptophans in StCBM64C but do not have an equivalent in the BsCBM63 structure.