Molecular Basis for the Selectivity and Specificity of Ligand Recognition by the Family 16 Carbohydrate-binding Modules from Thermoanaerobacterium polysaccharolyticum ManA*

Enzymes that hydrolyze complex polysaccharides into simple sugars are modular in architecture and consist of single or multiple catalytic domains fused to targeting modules called carbohydrate-binding modules (CBMs). CBMs bind to their ligands with high affinity and increase the efficiency of the catalytic components by targeting the enzymes to its substrate. Here we utilized a multidisciplinary approach to characterize each of the two family 16 carbohydrate-binding domain components of the highly active mannanase from the thermophile Thermoanaerobacterium polysaccharolyticum. These represent the first crystal structures of family 16 CBMs. Calorimetric analysis showed that although these CBMs demonstrate high specificity toward β-1,4-linked sugars, they can engage both cello- and mannopolysaccharides. To elucidate the molecular basis for this specificity and selectivity, we have determined high resolution crystal structures of each of the two CBMs, as well as of binary complexes of CBM16-1 bound to either mannopentaose or cellopentaose. These results provide detailed molecular insights into ligand recognition and yield a framework for rational engineering experiments designed to expand the natural repertoire of these targeting modules.

Enzymes that hydrolyze complex polysaccharides into simple sugars are modular in architecture and consist of single or multiple catalytic domains fused to targeting modules called carbohydrate-binding modules (CBMs). CBMs bind to their ligands with high affinity and increase the efficiency of the catalytic components by targeting the enzymes to its substrate. Here we utilized a multidisciplinary approach to characterize each of the two family 16 carbohydrate-binding domain components of the highly active mannanase from the thermophile Thermoanaerobacterium polysaccharolyticum. These represent the first crystal structures of family 16 CBMs. Calorimetric analysis showed that although these CBMs demonstrate high specificity toward ␤-1,4-linked sugars, they can engage both cello-and mannopolysaccharides. To elucidate the molecular basis for this specificity and selectivity, we have determined high resolution crystal structures of each of the two CBMs, as well as of binary complexes of CBM16-1 bound to either mannopentaose or cellopentaose. These results provide detailed molecular insights into ligand recognition and yield a framework for rational engineering experiments designed to expand the natural repertoire of these targeting modules.
The turnover of photosynthetically fixed carbon through the action of microbial glycoside hydrolases has been estimated to be of the order of 10 11 tons annually (1). Consequently, glycoside hydrolases play a key role in the global carbon cycle, and their properties, if well harnessed, will enhance the release of substrates (monomeric sugars) critical to the use of cellulosic materials in the biofuel industry. For most glycoside hydrolases, such as bacterial cellulases and xylanases, the polypeptides are organized in a modular arrangement that usually consists of a catalytic domain and an associated carbohydrate-binding mod-ule (CBM). 2 These modules may be joined through linkers that are rich in proline, serine, and threonine (2). The evolutionary rationale that led to these complex molecular architectures is currently unclear (3).
It has been suggested that CBMs attain multivalency through multiplicity (4). Therefore, it could be hypothesized that multiple CBMs act synergistically to bind to their target ligand, leading to an increased accessibility of the catalytic domain to the target polysaccharide. This view is supported by studies on a recombinant double CBM, constructed by fusing family 1 CBMs of two Trichoderma reesei cellobiohydrolases via a linker peptide (5). A 5-10-fold increase in the affinity for crystalline cellulose was observed for the double CBM compared with the individual modules, and similar results were observed for two CBMs of Cellulomonas fimi xylanase 11A (3). Thus, in some arrangements, the affinity for substrates increases with multiplicity of CBMs, and this appears to occur frequently in hyperthermophiles and thermophiles (6). Contrary to these finding, a report on the product of a manA gene of Caldicellulosiruptor strain RT8B.4 with two N-terminal CBMs showed no relationship between binding affinity and multiplicity of CBMs (7). Similar conclusions were also obtained from studies based on a Thermotoga neapolitana xylanase flanked by two different CBMs (7).
An alternative hypothesis that may explain multiplicity of CBMs in glycoside hydrolases is to increase the diversity of their target polysaccharides or substrates, because polysaccharides found in plant cell walls are heterogeneous in nature. The Paenibacillus sp. strain W-61 xylanase 5 seems to support this hypothesis. In this polypeptide, there is a C-terminal family 22 and an N-terminal family 9 CBM, respectively (8). The family 22 CBM binds to xylan, whereas the family 9 CBM binds to cellulose (4).
Carbohydrate-binding modules have been identified in over 100 ␤-1,4-glucanases (9,10), and aside from increasing the relative concentration of their respective enzymes on the surface of the substrate, they may also modify substrates into amorphous forms and thus make available to the catalytic domain a more easily degradable substrate (11)(12)(13). This is perhaps the case for the two CBMs found in the highly active mannanase (ManA) of Thermoanaerobacterium polysaccharolyticum, because their removal compromises enzyme activity (14). The widespread association of CBMs with glycoside hydrolases suggests that enzyme adsorption play a critical role in the degradation of polysaccharides. Insoluble plant cell wall polysaccharides such as cellulose hold a lot of promise as future substrates for energy production. Understanding the mechanism of CBMmediated adsorption to ␤-glucan may therefore be critical to our capacity to fully utilize energy crops as substrates for industrial processes, especially for the biofuels industry.
In an effort to understand the influence of the T. polysaccharolyticum carbohydrate-binding modules on the enzymatic activity of the parental protein, we have used bioinformatics to delineate its two linker-less CBMs and carried out biochemical and structural studies on the resultant polypeptides. This represents the first comprehensive structure-thermodynamic analysis of family 16 carbohydrate-binding modules. Here we show that the first and second carbohydrate-binding modules from T. polysaccharolyticum (TpolCBM16-1 and Tpol-CBM16-2, respectively) display high affinity for ␤1,4-glucose and ␤1,4-mannose polymers and provide a quantitative assessment of ligand binding using isothermal titration calorimetry. We also present four different high resolution x-ray crystallographic structures as follows: 1) TpolCBM16-1, determined to resolution of 1.5 Å; 2) TpolCBM16-2, determined to 1.9 Å resolution; 3) the co-crystal structure TpolCBM16-1 bound to cellopentaose, determined to 1.2 Å resolution; and 4) the co-crystal structure of TpolCBM16-1 bound to mannopentaose, determined to 2.2 Å resolution. These structures demonstrate that each module contains a single cleft, with two aromatic residues located at the base and along one face and two stretches of polar residues on either side of this cleft, which constitutes the ligand-binding site. The molecular basis for both plasticity in ligand selectivity and specificity by these modules is discussed.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The genes encoding the first and second carbohydrate-binding module from T. polysaccharolyticum were cloned by PCR. To amplify the gene segment encoding TpolCBM16-1, a forward primer with the sequence 5Ј-CATATGgtaaacatggtgagcaaccggg, with an engineered NdeI restriction site (shown in capitals) preceding the start codon, and a reverse primer 5Ј-CTCGAGctaaacttccaccagatatacatcatcca, with an engineered XhoI site (shown in capitals) after a termination codon, were used with the manA gene as the template. The ManA TpolCBM16-2 was also amplified by PCR with forward primer (5Ј-CATATGtccaatttaatagtgaacggaacagc) and reverse primer (5Ј-CTCGAGctatacttcaacgagcgtgatattatcg). Note that these two primers also contained engineered NdeI and XhoI sites in the forward and reverse primers, respectively. Each TpolCBM16 encoding sequence was cloned into a TA cloning vector, pGEM-T (Promega), and sequenced to confirm the correctness of the coding sequence. Each gene was then removed and inserted into a modified pET-28 plasmid (Novagen) bearing an N-terminal polyhistidine tag and a thrombin digestion site preceding the inserted gene. The modification in the pET28 vector was the replacement of the gene encoding kanamycin resistance with that encoding ampicillin resistance. The integrity of all constructs was confirmed by sequencing.
Escherichia coli expression strain BL21(DE3) was transformed with the appropriate expression vectors, and single colonies of transformed E. coli were used to inoculate 5 ml of LB medium supplemented with ampicillin (100 g/ml). Five hours following inoculation, the small scale culture was added to 1 liter of LB medium containing ampicillin (100 g/ml) for growth at 37°C. When the A 600 nm of the culture reached 0.3, protein expression was induced with 0.1 mM isopropyl ␤-Dthiogalactopyranoside, and the cells were further grown for 6 h. Bacterial cells were pelleted by centrifugation (4000 ϫ g for 1 h) and resuspended in 100 mM KCl, 20 mM Tris-HCl, pH 8.3, 10% glycerol, and a mixture of protease inhibitors. Resuspended cells were disrupted by multiple passes through an Avestin C5 Emulsiflex French press cell, and insoluble aggregates and cellular debris were removed by centrifugation (15 000 ϫ g for 1 h).
Recombinant proteins were purified from the clarified supernatant by virtue of the N-terminal polyhistidine tag using a Talon resin column (Clontech) charged with cobalt chloride. Following elution from the cobalt affinity resin, the cleavable polyhistidine tag was removed using thrombin (1 unit/mg protein; GE Healthcare). The protein was further purified by anion exchange (5 ml HiTrap Q; GE Healthcare) and size exclusion chromatographies (Superdex 75 16/60; GE Healthcare) prior to crystallization. Site-specific variants were purified in essentially the same manner. Selenomethionine-incorporated TpolCBM1 was produced by the method of Van Duyne et al. (15) and purified in the manner described above except that 5 mM ␤-mercaptoethanol was added to all of the buffers. The protein samples were estimated to be greater than 95% pure as judged by SDS-PAGE.
Affinity Gel Electrophoretic Mobility Assays-Assays to monitor the binding of TpolCBM16-1 and TpolCBM16-2 to soluble substrates were performed as described by Tomme et al. (16) in the absence of SDS. 2-Mercaptoethanol was excluded from the loading buffer, and the proteins were not heated at 95°C prior to loading to the gel. Complexes were resolved on a nondenaturing separating 12% polyacrylamide gel utilizing 0.1% of the following soluble polysaccharide substrates: carboxymethyl cellulose, locust bean gum (galactomannan polysaccharide), or xylan. In all experiments the same proteins were run simultaneously in the control gel without incorporated soluble substrates. The electrophoresis was carried out at 100 V at 4°C for 3 h. Proteins were visualized by staining with Coomassie Brilliant Blue R-250.
Isothermal Titration Calorimetry-Measurements were made at 25°C using a VP-ITC calorimeter (MicroCal, Inc, Northhampton, MA) following the manufacturer's recommended procedures. All samples were extensively dialyzed against 50 mM potassium phosphate buffer, pH 7.0, and all ligands were dissolved in the same buffer. The protein sample (100 -200 M), in a 1.4399-ml reaction cell, was injected with 28 successive 10-l aliquots of ligand (1-1.5 mM) at 300-s intervals. Based on the results from our crystallographic experiments, the number of binding sites was presumed to be one. Data were fitted by nonlinear regression using a single site model (MicroCal Origin), and thermodynamic parameters were calculated using the Gibbs free energy equation (⌬G ϭ ⌬H Ϫ T⌬S), and the relation ϪRT lnK a ϭ ⌬G.
Crystallization and X-ray Data Collection-Initial crystallization conditions were established by the sparse-matrix sampling methods using commercial and homemade screens. Refinement of promising conditions yielded large crystals suitable for diffraction analysis. Crystals of TpolCBM16-1, Tpol-CBM16-2, TpolCBM16-1-cellopentaose, and Tpol-CBM16-1mannopentaose complexes were all grown using the hanging drop vapor diffusion method. For crystallization of native and SeMet-labeled TpolCBM16-1, 2 l of protein sample (20 mg/ml, in 100 mM KCl, 10 mM HEPES, pH 7.5) was added to 2 l of precipitant (3 M sodium formate, pH 7, 15% ethylene glycol) and equilibrated over a well containing the precipitant solution at 20°C. Crystals grew within a week and reached a maximum size of 0.3 ϫ 0.4 ϫ 0.6 mm. As the crystallization precipitant proved to be suitable for cryo-protection, single crystals were harvested straight from the crystallization drop and flashcooled in liquid nitrogen prior to data collection.
Co-crystals of the TpolCBM16-1-mannopentaose complex were grown by adding 2 l of protein sample (20 mg/ml, in 100 mM KCl, 10 mM HEPES, pH 7.5, plus 5 mM mannopentaose) to 2 l of precipitant (1.6 M ammonium sulfate, 100 mM MES, pH 6.5, 10% dioxane) and equilibrating the mixture over a well containing the precipitant solution at 20°C. Crystals grew within 7 days and reached a maximum size of 0.2 ϫ 0.2 ϫ 0.4 mm. Prior to flash-cooling, crystals were briefly soaked in the precipitant solution supplemented with 25% glycerol.
Initial screening with TpolCBM16-2 established sodium formate as a viable precipitant for crystallization. These crystals grew as thin needles and could not be improved upon despite extensive screening. Sequence gazing showed that Tpol-CBM16-2 contained a cysteine (Cys 74 ) at a position occupied by Phe 76 in TpolCBM16-1. Reasoning that Cys 74 may be fairly reactive, TpolCBM16-2 was incubated with 1 mM of mercury(II) acetate, to cap this residue, prior to crystallization trials. Large single crystals of CBM16-2 were grown by adding 2 l of protein sample (20 mg/ml, in 100 mM KCl, 10 mM HEPES, pH 7.5, plus 1 mM HgC 2 H 4 O 2 ) to 2 l of precipitant (3.5 M sodium formate, pH 7.0) and equilibrating over a well containing the precipitant solution at 20°C. Prior to flash-cooling, crystals were briefly soaked in the precipitant solution supplemented with 20% ethylene glycol.
Flash-cooled crystals of unliganded TpolCBM16-1 diffracted x-rays to a minimum Bragg spacing of 1.3 Å, using an insertion device x-ray beam line utilizing an ADSC Q4 CCD detector (IMCA-CAT-Sector 17ID, Advanced Photon Source, Argonne, IL). These crystals occupied space group P2 1 2 1 2 1 with unit cell parameters a ϭ 57.4 Å, b ϭ 77.8 Å, and c ϭ 77.9 Å and contained two molecules in the crystallographic asymmetric unit. A 6-fold redundant data set was collected to a limiting resolution of 1.4 Å (overall R merge ϭ 4.60, I/(I) ϭ 3.6 in the highest resolution shell).
A 6-fold redundant data set was collected from crystals of TpolCBM16-1-mannopentaose complex (overall R merge ϭ 3.9, I/(I) ϭ 3.5 in the highest resolution shell) to a limiting Bragg spacing of 2.2 Å at an insertion device synchrotron beam line utilizing an ADSC Q4 CCD detector (IMCA-CAT-Sector 17ID, Advanced Photon Source, Argonne, IL). Crystals occupy the trigonal space group P3 2 21 with unit cell parameters a ϭ b ϭ 74.1, and c ϭ 116.1 Å, with two molecules in the crystallographic asymmetric unit.
Flash-cooled crystals of unliganded TpolCBM16-2 occupy space group C222 1 with unit cell parameters a ϭ 90.4 Å, b ϭ 90.5 Å, c ϭ 139.4 Å, with four molecules in the crystallographic asymmetric unit. The a and b unit cell edges are coincidentally close, and the crystals demonstrate tetragonal pseudo-symmetry consistent with space group P4 2 22, with two molecules in the crystallographic asymmetric unit. However, scaling of data in the higher symmetry space group results in a slight increase in merging statistics, and subsequent crystallographic refinement in the tetragonal setting resulted in a free R factor that stalls at 40%. Hence, the data were scaled in the orthorhombic setting and structure determination, and refinement was carried out in space group C222 1 . A 6-fold redundant data set was collected to a limiting resolution of 1.9 Å (overall R merge ϭ 4.3, I/(I) ϭ 12 in the highest resolution shell) utilizing a Mar 225 CCD detector (SER-CAT-Sector 22BM, Advanced Photon Source, Argonne, IL).
Phasing and Structure Determination-Although crystals of selenomethionine-substituted TpolCBM16-1 were grown under similar conditions, SeMet-labeled TpolCBM16-1 occupied a different space group (C222 1 , with one molecule in the crystallographic asymmetric unit), with unit cell parameters a ϭ 56.4 Å, b ϭ 96.8 Å, c ϭ 73.6 Å. A 6-fold redundant data set was collected at a wavelength near the selenium absorption edge, to a limiting resolution of 1.7 Å (overall R merge ϭ 5.5, I/(I) ϭ 7 in the highest resolution shell) utilizing a ADSC Q4 CCD detector (IMCA-CAT-Sector 17ID, Advanced Photon Source, Argonne, IL). The structure of SeMet TpolCBM16-1 was solved by single wavelength anomalous diffraction utilizing anomalous scattering from the three selenium-substituted methionine residues per monomer of TpolCBM16-1. Data were indexed and scaled using the HKL2000 package (17,18). Selenium sites were identified, and heavy atom parameters were further refined using SOLVE (19,20) yielding an initial figure of merit of 0.28 -1.7 Å resolution (Table 1). Solvent flattening with RESOLVE further improved the quality of the initial map and permitted most of the main chain and 45% of the side chain residues to be automatically built by the software (21). The remainder of the model was fitted using XtalView (22) and further improved by rounds of refinement with REFMAC5 (23) and manual building.
The structure of native TpolCBM16-1 was determined by molecular replacement using the SeMet TpolCBM16-1 structure as a search probe (24). Following rigid body refinement of the initial molecular replacement solution, the atomic model was subject to automatic rebuilding using ARP/wARP (25), resulting in a near complete trace of both molecules in the crystallographic asymmetric unit. The remainder of the model was fitted using XtalView (22) and further improved by rounds of refinement with REFMAC5 (23). The co-crystal structures of TpolCBM16-1-cellopentaose and TpolCBM16-1-mannopentaose were determined by molecular replacement using the final refined coordinates of native TpolCBM16-1 as a search probe. Clear density for the ligands could be observed in both co-crystal structures prior to crystallographic refinement. Cycles of manual rebuilding followed by crystallographic refinement were carried for each of the co-crystal structures. The respective ligands were manually built into the difference Fourier maps after the free R factors dropped below 30%. Crossvalidation, using 5-7% of the data for the calculation of the free R factor (26,27), was utilized throughout model building process to monitor building bias. The stereochemistry of the models was routinely monitored throughout the course of refinement using PROCHECK (28).
The structure of native TpolCBM16-2 was determined by molecular replacement (23) using the refined coordinates of the SeMet TpolCBM16-1 structure as a search probe. Multiple rounds of manual model building were interspersed with refinement using REFMAC5 (23) to complete structure refinement. Cross-validation used 5% of the data in the calculation of the free R factor (26,27). Despite extensive manual rebuilding and refinement, residues in two loop regions, bridging Tyr 82 through Thr 90 and Thr 111 through Phe 116 , respectively, remain ill-defined and have not been modeled in the structure. Consequently, the free R factor of 30% for this structure is slightly higher than would be expected, given the resolution limit of 1.9 Å. The stereochemistry of the model was routinely monitored throughout the course of refinement using PROCHECK (28).
Computational Procedures-All calculations were performed with MOE 2006.08 (Chemical Computing Group, Inc.), using either the MMFF94x or OPLS-AA force fields. Starting coordinates for the energy minimization of the TpolCBM16-1cellopentaose and TpolCBM16-1-mannopentaose complexes (this study) and the cellopentaose-CBM complex from Cellulomonas fimi (PDB 1gu3), solved by Davies and co-workers (29), were taken from the coordinates of the crystal structures, in which crystal waters were included. All atoms (solvent, CBM, and ligand) were energy-minimized until an energy gradient of 10 Ϫ4 kcal/mol/Å 2 was achieved. Interaction energy of the ligand with CBM and explicit solvent was then calculated using the MMFF94x force field, respectively.

RESULTS
Delineation of the CBM Domain Boundaries-In a previous report (14), our delineation of TpolCBM16-1 and Tpol- CBM16-2 suggested that the two modules were separated by 23 amino acids; hence each repeat started with a highly conserved N terminus characterized by the seven amino acids SAVPEAA and ended with a highly conserved sequence LVEV (Fig. 1C). We cloned and expressed each module based on this demarcation. However, neither of the products bound to Sigmacell type 50, although a construct with both putative TpolCBM16s included bound to this substrate. A visual inspection of the 23 amino acids occurring between the putative TpolCBM16-1 and putative TpolCBM16-2 showed that the sequence shares conserved residues with the 23 amino acids preceding the putative TpolCBM16-1. Therefore, we made two new constructs that included the 23 amino acids upstream of the previously demarcated N termini of the putative TpolCBM16s. Each of the gene constructs reported here that extend the N-terminal regions of each of TpolCBM16 construct demonstrates tight binding to Sigmacell type 50. Thus, although our original C-terminal delineation of the duplicated TpolCBM16 of ManA was correct, the N-terminal demarcation was erroneous. Thermodynamic Analysis of Ligand Specificity-The binding specificity of the purified Tpol-CBM16-1 and TpolCBM16-2 was determined by native electrophoretic mobility assays using various polysaccharide substrates. These studies did not show retardation of each of these modules in the presence of xylans or ␤1,3-linked glucose polymers. In contrast, tight affinity was observed between both TpolCBM16-1 or TpolCBM16-2 and ␤1,4-linked glucose polymers such as cellulose. The affinity of these polypeptides for cellulose was such that once the individual module was bound to cellulose, the complex could only be dissociated by denaturing the protein.
The binding characteristics of each of these carbohydrate-binding modules were further quantified by isothermal titration calorimetric analysis using oligosaccharides of defined chain length. The binding isotherms for the interaction of TpolCBM16-1 or TpolCBM16-2 with each oligosaccharide were fitted by nonlinear regression with a simple bimolecular interaction model ( Table 2). Calorimetric analysis using long chain polymers hydroethylcellulose and glucomannan demonstrates that Tpol-CBM16-1 interacts with glucomannan with a near 9-fold greater affinity than for hydroethylcellulose. Glucomannan is an abundant water-soluble polysaccharide that consists of mixed ␤-1,4-linked polymers containing glucose and mannose. Consistent with our thermodynamic data, glucomannan is likely the natural substrate for TpolCBM16-1 and Tpol-CBM16-2.
For the interactions with synthetic short chain oligosaccharides, the binding stoichiometries are consistent with a 1:1 interaction. The measured binding affinities increase as a function of chain length with an optimum consistent with a binding site that can accommodate a polymer of five sugar molecules. As is typical of protein-carbohydrate interactions, the thermodynamics of the interaction of TpolCBM16-1 and Tpol-FIGURE 1. Ribbon diagram of the overall structures of TpolCBM16-1 (A) and TpolCBM16-2 (B) colored from the N to C termini of the proteins with a color ramp from blue to red are shown. A calcium ion that confers stability to the folded protein is shown as a purple sphere. C, alignment of the amino acid sequences of TpolCBM16-1 and TpolCBM16-2 with the corresponding secondary structure elements shown above. Overall Structures of CBM1 and CBM2-Crystals of CBM1, grown using sodium formate as a precipitant, occupied space group P2 1 2 1 2 1 , with two molecules in the crystallographic asymmetric unit and diffracted synchrotron x-radiation to a Bragg limit of 1.5 Å. To obtain crystallographic phases, crystals of TpolCBM16-1 were grown from selenomethionine-labeled protein. However, crystals of SeMet-labeled TpolCBM16-1 occupied a different space group (C222 1 , with one molecule in the crystallographic asymmetric unit), and the structure of SeMet TpolCBM16-1 was solved to a resolution of 1.7 Å by single wavelength anomalous diffraction methods. The structure of native TpolCBM16-1 was solved to a resolution of 1.4 Å by molecular replacement using the refined coordinates of the structure of SeMet-labeled TpolCBM1. The two structures provide three independent views of the molecule and allow for detailed analysis of the structure, independent of crystal packing artifacts. The structure of TpolCBM16-2 was solved to a resolution of 2.1 Å by molecular replacement utilizing the final refined coordinates of TpolCBM16-1.
Both TpolCBM16-1 and TpolCBM16-2 fold into a compact domain typical of a ␤-sandwich fold consisting of two layers of five antiparallel ␤-sheets. Given the structural similarity between TpolCBM16-1 and TpolCBM16-2, for brevity, only the topology of TpolCBM16-1 will be described in detail. The bottom layer of the ␤-sandwich is composed of sheets ␤2 (Met 24 -Va l27 ), ␤3 (Gly 37 -Ile 40 ), ␤10 (Leu 134 -Glu 143 ), ␤5 (Thr 59 -Phe 68 ), and ␤8 (Thr 104 -Thr 111 ). The upper layer of the sandwich is composed of sheets ␤1 (Gln 18 -Asp 19 ), ␤4 (Ala 46 -Asp 51 ), ␤9 (Gln 121 -Lys 126 ), ␤6 (Phe 76 -His 83 ), and ␤7 (Tyr 91 -Phe 98 ). A single structural metal ion, presumed to be calcium by analogy to the structures of other carbohydrate-binding mod-ules, is flanked by the N-terminal loop residues, the loop region between sheet ␤2 and ␤3, and the C-terminal region of ␤4. The ion has a coordination geometry characteristic of calcium and is engaged by the O-⑀2 of Glu 13 , a bidentate interaction with Asp 139 , and the main chain carbonyls of Gly 11 , Asn 35 , and Leu 38 , and a water molecule. A diagram of the overall polypeptide fold of TpolCBM16-1 and TpolCBM16-2 is shown in Fig.  1, A and B, respectively, and a structure-based secondary structure assignment is shown in Fig. 1C.
The core architecture of Tpol-CBM16-1 and TpolCBM16-2 is similar to that observed in the threedimensional structures of other families of carbohydrate-binding modules. A DALI search against other structures deposited in the Protein Data Bank shows the closest structural homologs to be the ␤-1,3 xylan-binding domain from CBM22 (30) (PDB code 1DYO) (r.m.s.d. of 2.9 Å over 138 aligned residues; Z-score ϭ 14.8), the ␤-1,3-binding laminarinase of CBM4-2 (29) (PDB code 1GUI) (r.m.s.d. of 2.1 Å over 123 aligned residues), and the sugarbinding domain of the F-box ubiquitin ligase (PDB code 1UMH) (r.m.s.d. of 3.1 Å over 135 aligned residues) (31). As in the structures of TpolCBM16-1 and TpolCBM16-2 presented here, a calcium ion is located at an equivalent position in the above noted structures of CBM22 and CBM4, where it contributes to the stability of the polypeptide (4,29,30).
Within the ␤-sandwich fold of TpolCBM16-1, the top layer of the sandwich is curved so as to create a cleft that runs along the resultant concave face, perpendicular to the direction of the ␤-sheets ( Fig. 2A). This cleft spans about 25 Å across the face of the ␤-sandwich and constitutes the ligand-binding site (Fig.  2B). Two solvent-exposed tryptophan residues (Trp 20 and Trp 125 ) lie at the base and along one face of this cleft, respectively. A number of polar residues surround the periphery of this cleft, including Gln 21 , Asp 77 , and Asn 97 on one side and Gln 81 , Gln 93 , and Gln 121 on the other side, where they are poised to hydrogen bond with the sugar ligand.
Given the sequence identity of 62% over 144 amino acids between TpolCBM16-1 and TpolCBM16-2, it is not surprising that the two structures are nearly identical. The main chain atoms from final refined coordinates of the two structures can be superimposed with an r.m.s.d. of 0.8 Å over 120 atoms. Although the two polypeptides share the same overall fold, and contain similar structural features, such as the single structural calcium ion, significant differences exists at the ligand-binding site. A superposition of the two structures reveals that the largest deviations exist in the loop region encompassing residues 18 -28 that includes the solvent-exposed Trp 20 . In addition, several of the polar residues that surround the ligand-binding site of TpolCBM16-1 are not conserved in TpolCBM16-2, including Gln 21 (Tyr 21 in CBM2) and Asn 97 (Arg 97 in Tpol-CBM16-2). These notable differences may account for the measured 2-fold lower affinity of TpolCBM16-2 for cellopentaose, relative to that for TpolCBM16-1 (see Table 2).
Molecular Basis for Ligand Specificity and Selectivity by CBMs-To determine the molecular basis for ligand specificity, we determined the crystal structure of TpolCBM16-1 bound to cellopentaose and to mannopentaose. The co-crystal structures of the TpolCBM16-1-oligosaccharide complexes were determined by molecular replacement using the refined coordinates of unliganded TpolCBM16-1 as a search model. Following rigid body refinement of the molecular replacement solutions, clear and continuous electron density, corresponding to each of the five ␤-1,4-linked sugars, could be observed in initial electron density maps (Fig. 3, A and C). Following further refinement and addition of solvent molecules, the pentasaccharide sugars were built into difference Fourier maps, and the binary complexes were refined without any constraints.
In the 1.2 Å resolution co-crystal structure of the Tpol-CBM16-1-cellopentaose complex, there are two identical copies of the complex in the asymmetric unit and the structural descriptions are apt for either molecule. Cellopentaose interacts with TpolCBM16-1 in an extended conformation in a 25 Å cleft located perpendicular to the face of the ␤-sandwich (Fig.  2). As in other co-crystal structures of other carbohydratebinding modules, two solvent-exposed aromatic residues (Trp 20 and Trp 125 ), which lie at the base and along one face of the ligand-binding cleft, engage in hydrophobic interactions with the last three terminal pyranosides in the reducing end of the oligosaccharide (Fig. 3B). The tryptophan residues are sep-arated by roughly 8.5 Å, and their indole side chains are co-planar with the D-configured sugars to provide extensive hydrophobic contact. Mutational analyses show that replacement of either of these tryptophan residues compromised ligand recognition and demonstrate the importance of the hydrophobic platform created by these two residues in ligand binding.
In addition to these hydrophobic contacts, an extensive set of hydrogen bond interactions stabilizes the interaction between the protein and the remaining two sugars at the nonreducing end of the pentasaccharide ligand (Fig. 3B). Sets of polar residues flank either side of the ligand-binding cleft where they interact with the hydroxyl groups of the cellopentaose. Asn 97 , Asp 77 , and Gln 21 , located along one face of the cleft, make contact with the glucose residues at sites 1, 2, and 4, respectively. Gln 93 , Gln 81 , and Gln 121 , located along the opposite face of the cleft, engage the sugar residues at sites 3, and 4, respectively. Additional contacts are observed between the C-6 hydroxymethyl group of glucose molecules at the reducing end of the oligosaccharide and residues Gln 21 and the backbone carbonyl of Trp 20 . The interaction between TpolCBM16-1 and cellopentaose buries a total of 499.6 Å 2 of total surface area.
In the 2.2 Å resolution co-crystal structure of TpolCBM16-1 bound to mannopentaose, the pentasaccharide is likewise buried in an extended conformation within the 25 Å cleft located perpendicular to the face of the ␤-sandwich. As in the Tpol-CBM16-1-cellopentaose complex, hydrophobic residues Trp 20 and Trp 125 interact with the three pyranosides at the reducing end (Fig. 3D). Hydrogen bond interactions between sugars to sites 1 and 4, and protein residues Asn 97 and Gln 21 reside along one face of the ligand-binding cleft. Along the opposite face of the cleft, residues Gln 93 , Gln 81 , and Gln 121 interact with the pyranosides at sites 3 and 4. The mannosides at sites 1 and 5 are twisted away from the ligand-binding cleft to accommodate the axial location of the C-2 hydroxyls of the mannosides (Fig. 3D). As a result, the TpolCBM16-1 forms fewer hydrogen bond interactions with the pyranosides of mannopentaose, relative to cellopentaose, and buries slightly less total surface area (430 Å 2 ) upon complex formation.
Comparison of the two co-crystal structures provides insights into the molecular basis for promiscuous ligand binding by TpolCBM16-1. Of particular interest is how the protein scaffold can accommodate the C-2 hydroxyl in an equatorial configuration in the cellopentaose complex, in contrast to how the axial C-2 hydroxyl is accommodated in the mannopentaose complex. The C-2 hydroxyls of the carbohydrates at sites 2 and 5 in the pentasaccharide complexes do not directly interact FIGURE 3. A, difference Fourier electron density maps (contoured at 3 over background) calculated with coefficients F obs Ϫ F calc at the ligand-binding site for the 1.2 Å resolution TpolCBM16-1-cellopentaose complex. B, direct hydrogen bonding interactions between residues of TpolCBM16-1 and cellopentaose. C, difference Fourier electron density maps (contoured at 3 over background) calculated with coefficients F obs Ϫ F calc at the ligand-binding site for the 2.2 Å resolution TpolCBM16-1-mannopentaose complex. D, direct hydrogen bonding interactions between residues of TpolCBM16-1 and mannopentaose.
with the protein. However, the C-2 hydroxyls of the carbohydrates at sites 1, 3, and 4 are characterized by distinct hydrogen bonding interactions with the protein. One of the major specificity determinants in this scaffold is Gln 81 , and this residue is able to interact with the C-2 hydroxyl at site 3 in either the axial or equatorial configuration. The axial configuration of the C-2 hydroxyl at sites 1 of the mannopentaose complex results in the loss of a hydrogen bond to the N-␦2 of Asn 99 . However, in the mannopentaose complex, the axial configuration at sites 4 results in an additional hydrogen bond, relative to the cellopentaose complex, with the N-⑀2 of Gln 23 . Thus, the different configurations of the C-2 hydroxyl are accommodated within the protein scaffold by compensatory hydrogen bond interactions.
In Silico Analysis of Ligand Specificity-To examine the nature of the TpolCBM16 specificity and affinity for ligands, the crystallographic complexes with cellopentaose and mannopentaose were compared with energy-minimized TpolCBM16polysaccharide complexes. Calculation of interaction energies (i.e. the sum of nonbonding energies) between oligosaccharide and TpolCBM16-1 showed a clear correlation with experimental ⌬H values determined from isothermal titration calorimetry (Fig. 4). When structural and thermodynamic data from a similar CBM-ligand complex (CBM-cellopentaose complex from C. fimi, PDB code 1gu3) (29) are also included with the Tpol-CBM16-1 data from this work, a strong correlation is seen between CBM-ligand interaction energy and experimental ⌬H (Fig. 4), with a slope of 6 Ϯ 1, and r ϭ 0.96. These calculations were all carried out using the MMFF94x force field. These energy calculations were also repeated using a different force field, the OPLS-AA, which yielded an almost identical trend in interaction energies versus experimental ⌬H (inset of Fig. 4).
It is clear that there are meaningful differences in the strength of interaction between the TpolCBM16-1-cello-pentaose and TpolCBM16-1-mannopentaose, which are also suggested by computational analysis. However, the small experimental differences between the ⌬H value of Tpol-CBM16-1-mannopentaose and TpolCBM16-1-cellotetrose are close to within error of one another, and the difference between calculated interaction energies of these complexes is obviously not meaningful.
When one parses the source of the difference in the interaction energies between the TpolCBM16-1-cellopentaose and TpolCBM16-1-mannopentaose complexes, one sees a significant difference in the net van der Waals contacts, in which the TpolCBM16-1-cellopentaose complex has a positive (i.e. unfavorable) net van der Waals energy of ϩ6 kcal/mol, and the TpolCBM16-1-mannopentaose has a net negative van der Waals energy of Ϫ14 kcal/mol. The TpolCBM16-1-cellotetraose complex has a small decrease in the unfavorable van der Waals energy. However, the majority of the loss in interaction energy in the TpolCBM16-1-cellotetraose complex, relative to the TpolCBM16-1-cellopentaose complex, is because of decreased electrostatic interactions. These same trends are also observed when performing energy calculations with the OPLS-AA force field.
Molecular Basis for Differential Affinity between Tpol-CBM16-1 and TpolCBM16-2-The results of our calorimetric analysis demonstrate that although TpolCBM16-1 and Tpol-CBM16-2 can both bind cellulose polymers, the affinity of TpolCBM16-2 for the polysaccharide is nearly 4-fold less than that of TpolCBM16-1. A comparison of the crystal structure of TpolCBM16-2 with the co-crystal structure of the Tpol-CBM16-1-cellopentaose complex provides a molecular rationale for this difference. A superposition of the ␣-carbon atoms of the two structures yields an r.m.s.d. of 0.8 Å over 120 residues, but an inspection of the alignment reveals that the most significant deviations occur at sheet ␤2 within a stretch of residues that span Ser 18 through Asp 25 and that harbor Trp 20 , a residue critical for ligand binding. In addition, several of the polar residues that surround the ligand-binding site of TpolCBM16-1 are not conserved in TpolCBM16-2, including Gln 21 (Tyr 21 in TpolCBM16-2) and Asn 97 (Arg 97 in TpolCBM16-2). These notable differences may account for the measured 2-fold lower affinity of TpolCBM16-2 for cellopentaose, relative to that for TpolCBM16-1 (see Table 2).

DISCUSSION
In this study we have characterized each of the two carbohydrate-binding modules from T. polysaccharolyticum ManA using a combination of biochemical, thermodynamic, computational, and structural biological approaches. We demonstrate that each of the TpolCBM16s can bind both ␤-1,4-glucose and mannose polymers and quantify the binding affinity for such ligands using isothermal calorimetry. Our thermodynamic analysis suggests that the natural substrate for CBM16s is likely glucomannan, a mixed ␤-1,4-linked polymer containing both glucose and mannose that comprises nearly half the dry weight of roots and softwoods. Hence, this scaffold has been adapted for plasticity in substrate recognition. In addition, we present four crystal structures, including those of ligand-free Tpol-CBM16-1 and TpolCBM16-2 and of the TpolCBM16-1-cello- pentaose and TpolCBM16-1-mannopentaose complexes, which provide a molecular rationale for understanding the ligand promiscuity and specificity by the TpolCBM16s.
Despite the binding promiscuity toward both mannan and glucan polymers, the TpolCBM16s are not able to interact with other oligosaccharides, including chitin, oligoxylans, and ␤-1,3-glucan species. Our co-crystal structures provide the chemical bases for understanding this binding specificity. Chitin is an unbranched polymer of N-acetyl-D-glucosamine that differs from cellulose by the replacement of the C-2 hydroxyl with an acetamido group. The lack of interaction between the TpolCBM16s and chitin may be explained by the steric clashes that would result between the acetamido groups and the protein scaffold at sugar residues at positions 3 and 4 of the polysaccharide. Although the chemical structures of xylan and cellulose are similar (xylans lack the C-6 hydroxymethyl group), their tertiary structures are quite distinct. Cellulose and mannan form flat polymers with a 180 o rotation between consecutive sugars, whereas xylan forms helical structures distinguished by a 120 o rotation between consecutive units (32). Hence, TpolCBM16s cannot engage the helical orientation of xylan polysaccharides within a ligand binding cleft that has evolved to accommodate planar polymers such as those of cellulose and mannan. Similarly, ␤-1,3-glucan polymers are shown to form helical structures in solution. Modeling studies of laminarins demonstrate an energetically favorable helical configuration with and angles of Ϫ72 o and 108 o , respectively (33). The binding preference for planar ␤-1,4-glucan polymers over helical ␤-1,3-glucans may likewise be dictated by the rigid nature of the binding cleft that is optimized to accommodate planar polymers.
The flexibility in ligand binding demonstrated by the Tpol-CBMs is reminiscent of plasticity observed for the family 29 carbohydrate-binding modules from the Piromyces equi noncatalytic protein NCP1 (34). These two CBM29-1 and CBM29-2 bind to a range of ␤-1,4-linked polysaccharides, including cellulose, mannan, xylan, and glucomannan. Structural studies of CBM29-2 bound to cellohexaose and to mannohexaose reveal very few direct hydrogen bonds, particularly between the protein and the C-2 hydroxyl of the ligand (35,36). Thus, ␤-1,4-linked polysaccharides with different configurations of the C-2 hydroxyl can be accommodated at the binding site of CBM29s as these hydroxyl groups do not significantly contribute to binding. Our co-crystal structures of TopCBM16-1 bound to polysaccharides illustrate an alternative mechanism for promiscuous ligand binding, namely compensatory changes in the hydrogen bonding between the C-2 hydroxyl and the protein to accommodate differences in configuration of the carbohydrate. Thus, whereas both CBM29 and TpolCBM16 demonstrate promiscuity in ligand specificity, each utilizes a distinct strategy for achieving this plasticity.
The ubiquitous nature of carbohydrate-binding modules that share an overall common fold suggests that this scaffold is one that effectively optimized polysaccharide binding. Subtle differences in the ligand-binding site have resulted in the evolution of a number of different polypeptides capable of engaging a wide diversity of polysaccharides utilizing common core elements. Many such carbohydrate-binding modules are teth-ered to a catalytic domain, and this modular architecture serves to enhance the efficacy by targeting the catalytic components toward their substrate through interactions mediated by the carbohydrate-binding domain. The modular nature of these enzymes suggests that the combinatorial assembly of different catalytic and carbohydrate-binding domains can result in the production of novel activities. Our work provides a foundation for understanding the ligand specificity of the TpolCBM16s, and these insights may be utilized in altering specificity through minor changes in the protein scaffold. This rational redesign approach can be utilized to expand the natural repertoire of enzymes that can degrade complex polysaccharides into simple sugars, and such an approach will have a broad impact in the biofuels industry.