The crystal structure of the family 6 carbohydrate binding module from Cellvibrio mixtus endoglucanase 5a in complex with oligosaccharides reveals two distinct binding sites with different ligand specificities.

Glycoside hydrolases that release fixed carbon from the plant cell wall are of considerable biological and industrial importance. These hydrolases contain non-catalytic carbohydrate binding modules (CBMs) that, by bringing the appended catalytic domain into intimate association with its insoluble substrate, greatly potentiate catalysis. Family 6 CBMs (CBM6) are highly unusual because they contain two distinct clefts (cleft A and cleft B) that potentially can function as binding sites. Henshaw et al. (Henshaw, J., Bolam, D. N., Pires, V. M. R., Czjzek, M., Henrissat, B., Ferreira, L. M. A., Fontes, C. M. G. A., and Gilbert, H. J. (2003) J. Biol. Chem. 279, 21552-21559) show that CmCBM6 contains two binding sites that display both similarities and differences in their ligand specificity. Here we report the crystal structure of CmCBM6 in complex with a variety of ligands that reveals the structural basis for the ligand specificity displayed by this protein. In cleft A the two faces of the terminal sugars of beta-linked oligosaccharides stack against Trp-92 and Tyr-33, whereas the rest of the binding cleft is blocked by Glu-20 and Thr-23, residues that are not present in CBM6 proteins that bind to the internal regions of polysaccharides in cleft A. Cleft B is solvent-exposed and, therefore, able to bind ligands because the loop, which occludes this region in other CBM6 proteins, is much shorter and flexible (lacks a conserved proline) in CmCBM6. Subsites 2 and 3 of cleft B accommodate cellobiose (Glc-beta-1,4-Glc), subsite 4 will bind only to a beta-1,3-linked glucose, whereas subsite 1 can interact with either a beta-1,3- or beta-1,4-linked glucose. These different specificities of the subsites explain how cleft B can accommodate beta-1,4-beta-1,3- or beta-1,3-beta-1,4-linked gluco-configured ligands.

The specific recognition of carbohydrates by proteins is central to many biological processes including cell signaling, hostpathogen interactions, and the microbial recycling of carbon from the plant cell wall (1). The microbial enzymes involved in plant cell wall degradation frequently display a modular structure in which non-catalytic carbohydrate binding modules (CBMs) 1 are linked to the catalytic module. The role of the CBMs is mainly to target the enzymes to specific structural polysaccharides and enhance catalytic efficiency by increasing the effective concentration of the enzyme on the surface of these insoluble substrates (2,3). CBMs are grouped into sequenced families, which may be found as part of a continuously updated carbohydrate-active enzyme data base at afmb.cnrsmrs.fr/CAZY. The structures of CBMs from 22 of the 34 known CBM families have been determined, and the majority of these modules have a "jelly roll" fold composed almost exclusively of ␤-strands (Ref. 2; see also afmb.cnrs-mrs.fr/CAZY).
The specificity of CBMs varies between families. All characterized proteins in CBM1, -5, and -10 bind to crystalline cellulose (4,5), whereas other families contain modules that bind to a variety of different ligands, exemplified by CBM4, which contains proteins that recognize cellulose, xylan, mannan, and laminarin (6 -9). The CBM6 family contains modules from ϳ35 enzymes (10) that display a diversity of substrate specificities. CBM6 proteins that bind to cellulose, xylan, mixed ␤-(1,3)(1,4)glucan and ␤-1,3-glucan have been described, and CBM6 sequences are also present in a ␣-1,6-mannanase (Swiss-Prot accession number Q9Z4P9) and several ␣-agarases (GenPept accession number AAF26838). The three-dimensional structure of two CBM6 modules from Clostridium thermocellum xylanase 10A (CtCBM6) (11) and a putative Clostridium stercorarium xylanase (CsCBM6) (12) have been determined. In contrast to all other CBM families, the CBM6 modules contain two clefts that could potentially function as ligand binding sites. Cleft B is located on the concave surface of one ␤-sheet, whereas cleft A is found in the loop region connecting the inner and outer ␤-sheets of the jelly roll fold. Cleft B is in a similar location to the ligand binding sites of CBMs from several other families that include CBM22, CBM4, CBM29, and CBM17 (13)(14)(15)(16), whereas cleft A is positioned in the loops connecting the two ␤-sheets and resembles the sugar binding sites of lectins, as has been discussed by Boraston et al. (12). NMR and * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
All crystals except those with Glc-4Glc-3Glc-4Glc-OMe belonged to the same space group as the native crystals and had approximately the same unit cell parameters (Table I). The co-crystals of CmCBM6-2 with Glc-4Glc-3Glc-4Glc-OMe belonged to space group P22 1 2 1 with unit cell parameters a ϭ 31.69 Å, b ϭ 37.93 Å, and c ϭ 102.41 Å, leading to a V m value of 2.2 Å 3 /Da, with one molecule in the asymmetric unit.
Co-crystals of CtCBM6 (10 mg/ml) with xylopentaose were obtained in hanging drops made of 2 l of a protein/xylopentaose mixture (ϳ2 mM oligosaccharide) and 2 l of reservoir solution containing 20% polyethylene glycol 1000, 0.2 M sodium acetate, and 0.1 M acetate buffer at pH 4.6. The crystals grew after 10 months at 4°C. Here, the crystals belonged to space group P2 1 with unit cell parameters a ϭ 27.48 Å, b ϭ 51.68 Å, c ϭ 36.13, and ␤ ϭ 102.78°. The V m value was calculated to be 1.9 Å 3 /Da with 1 molecule in the asymmetric unit.
Data Collection, Structure Determination, and Refinement-All crystals were transferred to a cryoprotectant solution comprised of the growth buffer supplemented with 15% (v/v) glycerol. Subsequently the crystals, mounted in cryo loops, were flash-frozen in a cold nitrogen stream at 100 K. The different complex data sets were collected at beamlines ID29, ID14-EH1, and ID14-EH3 at the European Synchrotron Radiation Facilities, Grenoble, France. The details of data collection statistics are resumed in Table I. The data were processed and reduced with MOSFLM (17), and all further computing used the CCP4 (18) suite unless otherwise stated. The crystal structure of CmCBM6-2 was solved by the molecular replacement method using the program AMoRe (19) and the coordinates of the CsCBM6 (Protein Data Bank code 1nae) as the molecular starting model. Two solutions, corresponding to the two molecules in the asymmetric unit, gave a correlation coefficient of 22% and an R-factor of 53%. The model building was performed with ARP/wARP (20) using the complete data set at 1.40 Å. Then the structure was refined with REFMAC5 (18) to a final R-factor of 16.07% and R free factor of 17.76%. All complex structures having the same space group as the native protein were directly submitted to a first refinement cycle with REFMAC5 before manually constructing the substrate molecules using the program TURBO (21) and adding the water molecules. The same set of reflections was flagged for the calculation of the free R value for these complexes. Because the co-crystals of the CtCBM6 with xylopentaose did not have the same space group as the native crystals, molecular replacement with AMoRe was applied to position the protein molecules correctly in the new unit cell. The same strategy was applied to solve the structure of the crystals of CmCBM6-2 in complex with Glc-4Glc-3Glc-4Glc-OMe. All structures were refined with REFMAC5. The final refinement statistics are summarized in Table I. All coordinates of the different structures have been deposited with the Protein Data Bank with accession codes as follows: CtCBM6 with xylopentaose, 1uxx; native CmCBM6-2, 1uxz; CmCBM6-2 with cellobiose/cellotriose, 1uxy/1uyy; CmCBM6-2 with Glc-4Glc-3Glc-4Glc-OMe, 1uz0; CmCBM6-2 with Glc-3Glc-4Glc-3Glc, 1uy0; CmCBM6-2 with xylotetraose, 1uyz.

RESULTS AND DISCUSSION
Structure of CmCBM6-2-The unliganded structure of Cm-CBM6-2 was solved by molecular replacement using the x-ray structure of CsCBM6-3 (RCSB Protein Data Bank code 1nae) and refined to a resolution of 1.40 Å. The final model of Cm-CBM6-2 comprised 2 copies of residues 1-133, 2 metal ions, and 390 water molecules. Both metal ions are common to CtCBM6, whereas CsCBM6-3 was reported to contain only one metal ion (12). In CmCBM6-2 the ions were modeled as Ca 2ϩ and were both hepta-coordinated. In each case a single water molecule completes the coordination. The structure of Cm-CBM6-2 adopts a classical ␤-jelly roll fold, consisting predom- inantly of two five-stranded ␤-sheets (Fig. 2). The overall structure is very similar to the two xylan binding family 6 CBMs previously described, with a root mean square deviation value of 0.9 Å over 123 matched C ␣ atoms with CsCBM6-3 and a value of 1.1 Å over 125 matched C ␣ atoms with CtCBM6.
Interestingly, CmCBM6-2 has an exposed tryptophan residue in the region designated cleft B (Fig. 2), which is located in a shallow groove formed by one of the ␤-sheets, analogous to the location of the binding clefts of CBM4, CBM15, CBM17, CBM22, and CBM29 (13)(14)(15)(16)22). Data presented below and in the accompanying paper (29) demonstrate that cleft B does indeed comprise a ligand binding site in CmCBM6-2 and that Trp-39 plays a pivotal role in carbohydrate recognition. By contrast, cleft B in both CsCBM6-3 and CtCBM6 is occluded from solvent by a proline residue in the neighboring surface loop and, thus, is not accessible to oligosaccharide or polysaccharide ligands. Furthermore CsCBM6-3 does not contain an aromatic residue in the equivalent position to Trp-39, which would further limit the capacity of cleft B in the C. stercorarium protein to bind ligand.
The Complex of CtCBM6 with Xylopentaose-Previously, we have shown that CtCBM6 binds with high affinity to xylans (unsubstituted and highly decorated) and xylooligosaccharides displaying the highest affinity for ligands with a degree of polymerization Ն5, suggesting the presence of five sugar binding sites (11).
The structure of CtCBM6 in complex with xylopentaose was determined to a 1.6-Å resolution (see "Experimental Procedures" for details). Electron density for all five sugar units in xylopentaose was well defined, allowing unambiguous modeling of the pentasaccharide bound to cleft A (Fig. 3A). The direction of the sugar in the binding cleft was resolved by assessing the temperature factors and the level of electron density of the sugar C5 and O5 atoms and the hydrogen bonding of O5. Xylopentaose binds in an almost 3-fold helical orientation characterized by the internal hydrogen bond from O3 n to O5 nϩ1 . The chair ring planes (defined by C2, C3, C5, and O5) of the three sugars Xyl2-4 (Xyl1 defines the reducing end of the ligand, and Xyl5 defines the non-reducing end) lie at angles of 89 and 84°, respectively. Xylan thus binds in its favored 3-fold helical conformation previously determined by x-ray fiber diffraction analysis of xylan (23). Such a conformation has previously been proposed for xylohexaose bound to CBM2b-1 from Cellulomonas fimi Xyn11A (24), and observed when the ligand is bound to CBM15 from Cellvibrio japonicus Xyn10C (22).
Most CBMs determined to date display either a "plateau" of hydrophobic residues (called type A CBMs) that bind to crystalline surfaces of cellulose (25) or extended grooves (type B CBMs) stretching across and perpendicular to one of the two ␤-sheets that form the sandwich (22,26). In contrast, the binding site both in CtCBM6 (11) and CsCBM6-3 (12) is unusual; it is not located in the central groove, which contains the ligand binding site in most of the CBMs studied to date but instead is on one edge of the sandwich. This binding site, designated "cleft A," from CsCBM6-3 binds to oligosaccharides derived from xylan and cellulose with similar affinities (12), whereas in CtCBM6 cleft A accommodates primarily xylan and xylooligosaccharides (11). The binding site (cleft A) is formed by two loops, residues 28 -34 and 90 -97, which contain two highly conserved aromatic residues, Tyr-33 and Trp-92, that stack against the ␣ and ␤ face of the central xylose of the pentasaccharide located in subsite 3, whereas Asn-120 forms a tight hydrogen bond (2.6 Å) with O3 of this sugar. It is interesting to note that although two aromatic residues are also involved in ligand binding in the xylan binding CBMs from family 2a and 15, these residues are in a perpendicular orientation with respect to each other and, by stacking against xylose residues n and n ϩ 2, play a key role in defining a binding site topology that can accommodate ligands with a 3-fold helical conformation (22,24). Further subsites are formed in CtCBM6 by Asp-64 and Thr-65, which bind to the reducing end of the ligand at subsite 1, Pro-118 is at subsite 2, Ile-23 and Asn-93 form subsite 4, whereas Gly-24, Ser-26 (weakly; OG 3.9 Å and N 3.8 Å), and Asn-93 interact with the non-reducing end of xylopentaose at subsite 5 (Fig. 3, B and C). The distances of hydrogen bonds and close contact are summarized in Table II. On the basis of NMR data we had speculated that the surface loop from Gly-24 to Gly-33 might undergo a conformational change upon binding (11), and Boraston et al. (12) observed that Phe-45 (Ile-23 in CtCBM6) in the equivalent loop in CsCBM6-3 moves by 5 Å when bound to xylotriose. Here we show that residues 25-27 undergo a positional shift, bringing Ser-26 (shifted by 4.1 Å) closer to the sugar at subsite 1 and, thus, facilitating the weak interaction between the hydroxy amino acid and O4 of the xylose at subsite 5. Although such conformational changes are likely to incur an energetic cost, which will lead to a reduction in overall affinity, the movement of the protein may reflect the heterogeneity of the xylan decorations. Thus, although the movement of Ser-26 is optimal for binding unsubstituted xylooligosaccharides, this amino acid may need to be more distant from the binding cleft to accommodate xylans that are decorated at O2 and/or O3. Previous studies show that CtCBM6 displays similar affinities for highly decorated and unsubstituted xylans (11), and the crystal structure of the protein-xylopentaose complex provides insights into the mechanism by which these side chains can be accommodated. Also xylose at subsite 3 cannot be substituted because both O2 and O3 point toward the surface of the protein, residues on both sides of this central sugar unit may be substituted. In subsite 2, both O2 and O3 point into the solvent and, thus, can be decorated, whereas at subsite 4 O2 is solventexposed and although O3 hydrogen bonds to ND2 of Asn-93 (3.1 Å), this hydroxyl group can still be decorated because it is perpendicular to the plane of the sugar ring. At subsite 1 both Thr-65 and Asp-64 make tight hydrogen bonds with O2 (2.5 and 2.7 Å, respectively) of the xylose unit, whereas decoration at O3 would also make a steric clash with the protein. At subsite 5 both O2 and O3 point into solvent, and thus, both hydroxyls can be substituted.
Comparison of the CtCBM6-xylopentaose complex with that of CsCBM6-3 bound to xylotriose at subsites 2, 3, and 4 reveals a similar arrangement of residues and surface loops around these three central subsites (Fig. 4). By contrast, residues involved in ligand binding at subsites 1 and 5 in the CtCBM6xylopentaose complex have no counterpart in CsCBM6-3. In the region forming subsite 1 of CtCBM6, there is a 1-residue insert in the equivalent region of CsCBM6-3, and the backbone positions of the residues cannot be superimposed. As a consequence, Asp-64 and Thr-65 have no equivalent residues in CsCBM6-3; however, the ND2 of Asn-85 in the C. stercorarium protein is predicted to make a hydrogen bond (2.9 Å) with O3 of the xylose at subsite 1. A similar situation is observed in the region forming subsite 5. Here, the positions of the loops formed by residues 22-32 in CtCBM6-3 and 44 -54 in FIG. 3. A, observed electron density map for xylopentaose bound in cleft A of CtCBM6. The map is a maximum-likelihood/ A -weighted 2 F obs Ϫ F calc electron density map contoured at a 1 level, corresponding to 1.6 e Å Ϫ3 . B, the xylopentaose bound to CtCBM6, represented as sticks. The residues interacting with the ligand are highlighted. C, schematic representation of the protein/ligand hydrogen bonds and stacking interactions within the five binding subsites.

Structure of Ligand Complexes from C. mixtus Lichenase CBM6
CsCBM6-3 are different. CsCBM6-3, however, contains several residues that are likely to form a functional subsite 5; Pro-48 is predicted to stack again the xylopyranose ring, whereas Asp-113 OD1 and the backbone carbonyl of Ser-46 will make hydrogen bonds with O5 and O2, respectively. Thus, both CsCBM6-3 and CtCBM6 are likely to contain five subsites, although the structures of the distal subsites in the two proteins are very different.
Three-dimensional Structures of CmCBM6-2 Oligosaccharide Complexes-The ligand binding properties of CmCBM6-2 are described in the accompanying paper (29). Briefly, this CBM6 binds to cellohexaose, the ␤1,4-␤1,3-mixed linked glucans lichenan, and barley ␤-glucan, insoluble forms of cellulose and displays weak affinity for the ␤1,3-glucan laminarin. Additionally, mutagenesis studies show that both cleft A and B can accommodate laminarin and cellulooligosaccharides, whereas only cleft B binds to glucans with mixed ␤-1,4-␤-1,3 linkages such as lichenan and barley ␤-glucan. It also appears that the binding of CmCBM6-2 to insoluble cellulose involves synergistic interactions between cleft A and cleft B (see the accompanying paper, Ref. 29).
We were able to obtain complexes of CmCBM6-2 with five different ligands, cellobiose, cellotriose, xylotetraose, and two ␤1,4-␤1,3-mixed linked glucans (Glc-4Glc-3Glc-4Glc-OMe and Glc-3Glc-4Glc-3Glc; see Fig. 1). However, no crystals of complexes with pure ␤-1,3-linked glucans were obtained either by soaking or co-crystallization. The lack of complex formation for this substrate by soaking experiments might be due to crystal packing constraints since cleft A is in close contact to a neighboring molecule. The much lower affinity displayed by the two binding clefts for the ␤-1,3-linked glucans might explain the lack of complex formation in the co-crystallization trials. The asymmetric unit for the cellulooligosaccharide complexes contains two protein molecules, so there are four putative ligand binding sites in the asymmetric unit (cleft A and B in both CmCBM6 molecules). In both the cellobiose and cellotriose complexes three of these sites are occupied, cleft A and B of the first molecule and only cleft B of the second molecule. Hereafter, ligand binding described for cleft A refers to one protein molecule in the asymmetric unit, whereas binding to cleft B relates to both copies of CmCBM6 in the asymmetric unit unless otherwise specified.
Three-dimensional Structure of CmCBM6-2 in Complex with Cellobiose and Cellotriose at Cleft A-In crystals of CmCBM6-2 soaked with cellobiose or cellotriose electron density compatible with the disaccharide was evident in subsites 3 and 4 (using the nomenclature for the subsites in CtCBM6 cleft A) of cleft A. The positions of C6 and O6 are undefined, which likely reflects ligand binding in both orientations. Electron density is only clearly defined for one sugar unit, the second sugar is disordered for both the di-and trisaccharide ligands, and no density is evident for the third sugar of cellotriose. The lack of electron density for a third sugar in the cellotriose complex suggests that one of the glucose units in this ligand is completely disordered, and thus, does not form stable interactions with cleft A. This is entirely consistent with the data of Henshaw et al. (29), which show that cleft A contains only two sugar binding subsites. The hydrogen bonds formed between protein residues and the hydroxyl groups of the sugar units as well as close contacts are summarized in Table III. Binding of cellobiose and cellotriose in cleft A occurs via stacking interactions of a terminal sugar unit between the two aromatic residues, Tyr-33 and Trp-92, at subsite 3, although the orientation of the bound ligand is unclear. In both orientations, three OH groups are stabilized by residues Glu-20 and Asn120 (Fig. 5 and Table III). For complexes in which the non-reducing end of the ligand is bound at subsite 3, O2, O3, and O4 interact with these two amino acids, whereas O1, O2, and O3 hydrogen bind to Glu-20 and Asn-120 when Glc1 (reducing end sugar) is sandwiched between the two aromatic residues in cleft A. The second glucose molecule interacts only with the hydroxyl group of Tyr-33 via either O2 or O3, depending on the orientation of the ligand, although the CBM does make indirect water-mediated hydrogen bonds with this sugar residue.
Three-dimensional Structure of CmCBM6-2 in Complex with Cellobiose and Cellotriose at Cleft B-At cleft B the "walls" of the main binding site, designated subsite 3, are formed by Trp-39 on one side and a short loop containing two adjacent glycine residues (residues 73 to 77) on the opposite side. Glc1, Glc2, and Glc3 of cellotriose are located in subsites 1, 2, and 3, respectively (Fig.  6). O4 and O6 of Glc1 make tight hydrogen bonds with the side chain of Glu-73 and to the main-chain atoms of the loop consisting of residues 73-77. The interaction of Glu-73 with the C6OH group explains why cleft B is able to bind gluco-but not xyloconfigured ligands. In subsite 2, hydrogen bonds are formed between Glc2 and both Ser-41 and Gln-113, whereas Glc1 is stabilized in subsite 1 through an interaction with Gln-13 ( Fig. 6 and Table III). Cellobiose, which is located in subsites 3 and 2, makes the same interactions with the CBM as Glc3 and Glc2 of cellotriose. The structural identification of three subsites for cellulooligosaccharides agrees with the biochemical data showing that cleft B displays maximum affinity for ␤-1,4-linked glucans with a degree of polymerization of Ն3 (29). Unlike cleft A, the electron density for both C6 and O6 in cellobiose and cellotriose are clear, indicating that these ligands are located in a single orientation in cleft B. Indeed the orientation of cellulose chains in FIG. 4. Stereo view of the superposition of cleft A in the three proteins, CmCBM6-2 with bound xylotetraose (protein is yellow,  substrate is beige), CsCBM6-3 with xylotriose (protein is dark blue, substrate is blue), and CtCBM6 with xylopentaose (protein is pink, substrate is light purple). The residues involved in subsites 2 and 4 are labeled.
CBM binding sites is a matter of some debate. The solution structure of a Cellulomonas CBM4 suggests that cellooligosaccharides can bind in both orientations (27), whereas the crystal structure of the same protein in complex with cellopentaose (15) and a CBM17 module bound to cellohexaose indicate that these ligands interact with the two CBMs in a single orientation (16).
Uniquely, CmCBM6 appears to accommodate cellulooligosaccharides uni-or bi-directionally, dependent on the binding site occupied.
Three-dimensional Structure of CmCBM6-2 in Complex with Xylotetraose-Interestingly, we were able to obtain a complex of CmCBM6-2 with xylotetraose even though the protein was shown to bind very weakly to xylo-configured ligands at cleft A but not at cleft B (29). The complex is formed by the two adjacent CmCBM6-2 molecules, present in the asymmetric unit. All four sugar units are defined by electron density with Xyl1 bound in cleft A, whereas Xyl2-4 are positioned along the surface of a neighboring CBM6 molecule in a region that does not comprise either cleft A or B. Only very few hydrogen bonds are formed between this second CBM6 molecule and the three xylose units that lie parallel to the surface of the protein. The xylose residue in cleft A is sandwiched between the two aromatic residues, Tyr-33 and Trp-92, as observed for all other ligands bound at subsite 3; however, the sugar is orientated at 90°to the xylose bound in the equivalent position in CtCBM6. The O4 of Xyl1 points into solvent, and thus, the other xylose molecules in the tetrasaccharide interact with a different CBM6 molecule (Fig. 4; Fig. 5C and Table III). Although the interaction of Xyl1 is consistent with the observation that cleft A interacts weakly with xylooligosaccharides, we assume that the complex of CmCBM6-2 with xylotetraose is due to opportunistic binding in the particular conditions of co-crystallization.
Three-dimensional Structure of CmCBM6-2 in Complex with ␤-1,4-␤-1,3-linked Glucans-The crystal structure of Cm-CBM6-2 in complex with the mixed ␤1,4-␤1,3-linked ligand, Glc-4Glc-3Glc-4Glc-O-methyl, revealed that Glc4 binds in cleft A, whereas Glc1-Glc3 is located in cleft B of a second, crystallographically related CBM6 molecule (Fig. 7A). This reflects the capacity of cleft B to bind ␤1,4-␤1,3-linked glucans and cleft A to accommodate terminal glucose molecules in subsite 3; however, the stabilization of the trimolecular complex by ligand binding to 2 protein molecules most probably is an artifact of crystallization because cleft A does not contribute to the binding of mixed-linked glucans. Interestingly, Glc4 is bound in cleft A in a manner similar to Xyl1 of xylotetraose, with O1, O2, and O3 interacting with Glu-20 and Asn-121 (molecule A in Fig. 7A). The ␤-1,4-linked Glc1 and Glc2 residues are bound in subsites 2 and 3 of cleft B, respectively, in a pattern identical to Glc2 and Glc3 of cellotriose (Fig. 6, A and B, and 7A). Subsite 4 is occupied by ␤-1,3-linked Glc3, whereas subsite 1 is unoccupied. In subsite 4, three hydrogen bonds stabilize the glucose unit, Lys-114 with O6 and O5, whereas the carbonyl of Gly-37 interacts with O6. Subsite 4 is positioned relative to the two central subsites 2 and 3 such that only a ␤-1,3-linked glucose can interact with this distal subsite. Furthermore, tight hydrogen bonds between subsite 3 and O6 and O4 prevents a ␤-1,4linked glucose moiety binding at subsite 4 (Fig. 7B).
In the last complex of CmCBM6-2 with the mixed ␤1,4-␤1,3linked oligosaccharide Glc-3Glc-4Glc-3Glc the ligand is bound only in clefts B. In common with all other oligosaccharides located in cleft B the primary interaction between the protein and ligand is between subsites 2 and 3 and 2 ␤-1,4-linked glucose moieties depicted in Fig. 6 and Table III. In one of the two molecules in the asymmetric unit only ␤-1,4-linked Glc2 and Glc3 are defined by electron density, whereas in the second CmCBM6-2 molecule all four binding subsites in cleft B are occupied with glucose units. Thus, subsite 1 can accommodate a ␤-1,3or ␤-1,4-linked glucose molecule; this latter, therefore, represents the maximum binding that can be obtained in cleft B with mixed ␤-1,4 -1,3-linked glucans.
Mechanisms of Ligand Recognition-The binding mode in cleft A is clearly different in CBM6 modules that display distinct specificities. The examples described here reflect the two extremes, CtCBM6, which is able to bind internal regions of xylan chains as reflected by its complex with xylopentaose (Fig.  3B), and CmCBM6-2, which binds to the terminal residues of both xylo-and gluco-configured oligosaccharides (Fig. 7a). In all CBM6 structures characterized to date, the conserved residues Asn-120, Trp-92, and Tyr-33 (numbers are from CmCBM6-2) play a pivotal role in ligand binding. These residues define the two subsites, 3 and 4, that are invariant in the three CBM6 proteins. CmCBM6-2 cleft A appears to bind only at subsites 3 and 4, whereas Boraston et al. (12) define 3 subsites in CsCBM6-3 and the complex of CtCBM6 with xylopentaose identifies 5 subsites. Overlaying the structure of CsCBM6-3 and CtCBM6 indicates that the C. stercorarium module also contains five subsites, although the structure of the distal subsites, 1 and 5, are very different in the two proteins. One of the key differences between CsCBM6-3 and CtCBM6 is that the C. thermocellum protein binds xylohexaose 100-fold tighter than cellohexaose, whereas the affinity of the C. stercorarium module for these two ligands is similar (xylohexaose binds ϳ5 tighter than cellohexaose). The complex of CtCBM6 with xylopentaose provides some insight into the mechanism by which this protein is able to discriminate between xylo-and glucoconfigured ligands. Specificity for xylose-containing polymers is conferred at subsites 4 and 5 where Ile-23 and Ser-26, respectively would make steric clashes with the C6OH of glucose. The capacity of CsCBM6-3 to bind cellohexaose more tightly than CtCBM6 may be due to differences in subsite 5, where the lack of an equivalent residue to Ser-26 in the C. stercorarium protein would facilitate the accommodation of the C6OH moiety of glucose. The accommodation of glucose in subsite 4 of CsCBM6-3 is less clear because Phe-45 (equivalent to CtCBM6 Ile23) would likely clash with the C6 group. It is possible that the cavity behind the aromatic side chain could provide sufficient space for Phe-45 to undergo a conformational change upon cellohexaose binding. Indeed, the 200,000-fold variation in the capacity of the Ϫ1 subsite of GH10 xylanases to accommodate glucose is dependent upon the conformational freedom of an invariant tryptophan at the active site; enzymes that contain a cavity behind the indole ring display the highest activity against gluco-configured substrates (28).
Inspection of the crystal structure of CmCBM6 provides some insight into its specificity for terminal sugar residues. The surface loop from Gln-18 to Tyr-33 adopts a slightly different conformation to the equivalent region of CtCBM6 and constricts the binding cleft at subsite 5. Indeed, overlaying the liganded CtCBM6 structure with CmCBM6 indicates that Thr-23 in the Cellvibrio protein will clash with O4 of the xylose at subsite 5. Furthermore, CmCBM6 Glu-20, the equivalent residue to Phe-45 and Ile-23 in CsCBM6-3 and CtCBM6, respectively, while playing a role in ligand binding at subsite 3, is predicted to clash with a sugar residue that occupies subsite 4.
CmCBM6-2 and CsCBM6-3 (12) are CBM6s for which binding has been observed in cleft B. From sequence alignments (see alignment in Boraston et al. (15) for example), however, one can conclude that several other CBM6 proteins will also have an equivalent ligand binding cleft. Our structure identifies a maximum of 4 binding sites in cleft B of CmCBM6-2, with subsites 2 and 3 comprising the major binding sites. Discrimination between gluco-and xylo-configured ligands occurs only at subsite 2, where the C6OH of glucose makes strong hydrogen bonds with Glu-73 and the backbone carbonyl of Glu-74, explaining why the protein does not bind xylan at cleft B. Supplementary binding is possible at subsites 1 and 4. At these distal binding sites two highly flexible residues, Lys-114 in subsite 4 and Gln-13 in subsite 1, undergo conformational changes so that the side chains can interact with glucose molecules located at these subsites. It is noteworthy that discrimination between ␤-1,4-linked glucans (cellulose) and mixed ␤-1,3-1,4-linked glucans occurs at subsite 4, where only a ␤-1,3-linked glucose can be accommodated, whereas subsite 1 can interact with a sugar that is linked either ␤-1,3 or ␤-1,4. Thus, the order of the ␤-1,3-1,4 linkages in glucans is important in defining affinity because only Glc-␤-1,3-Glc-␤-1,4-Glc-␤-1,3-Glc and Glc-␤-1,3-Glc-␤1,4-Glc-␤-1,4-Glc will occupy all four subsites.