The Structural Basis for the Ligand Specificity of Family 2 Carbohydrate-binding Modules*

The interactions of proteins with polysaccharides play a key role in the microbial hydrolysis of cellulose and xylan, the most abundant organic molecules in the biosphere, and are thus pivotal to the recycling of photosynthetically fixed carbon. Enzymes that attack these recalcitrant polymers have a modular structure comprising catalytic modules and non-catalytic carbohydrate-binding modules (CBMs). The largest prokaryotic CBM family, CBM2, contains members that bind cellulose (CBM2a) and xylan (CBM2b), respectively. A possible explanation for the different ligand specificity of CBM2b is that one of the surface tryptophans involved in the protein-carbohydrate interaction is rotated by 90° compared with its position in CBM2a (thus matching the structure of the binding site to the helical secondary structure of xylan), which may be promoted by a single amino acid difference between the two families. Here we show that by mutation of this single residue (Arg-262→Gly), a CBM2b xylan-binding module completely loses its affinity for xylan and becomes a cellulose-binding module. The structural effect of the mutation has been revealed using NMR spectroscopy, which confirms that Trp-259 rotates 90° to lie flat against the protein surface. Except for this one residue, the mutation only results in minor changes to the structure. The mutated protein interacts with cellulose using the same residues that the wild-type CBM2b uses to interact with xylan, suggesting that the recognition is of the secondary structure of the polysaccharide rather than any specific recognition of the absence or presence of functional groups.

The molecular recognition of carbohydrates by proteins is of fundamental importance in numerous biological processes, including cell-cell recognition, cellular adhesion, and host-pathogen interactions. Understanding the structural basis of the ligand specificity of carbohydrate binding proteins is therefore critical. A very important group of carbohydrates is the structural polysaccharides located in plant cell walls, as these polymers are the most abundant organic molecules in the biosphere. The plant cell wall consists largely of cellulose (␤-1,4linked glucose) fibrils, cross-linked by a mesh of hemicelluloses, of which xylan (␤-1,4-linked xylopyranose) is the predominant form (1). The microbial hydrolysis of cellulose and xylan is an essential component of the recycling of photosynthetically fixed carbon and is thus of fundamental biological importance. The plant cell wall is highly recalcitrant to enzyme attack, and its degradation requires the concerted action of a large number of different enzymes that target glycosidic and ester bonds. These plant cell wall hydrolases are generally modular, consisting of a catalytic module linked to one or more non-catalytic carbohydrate-binding modules (CBMs) 1 (2,3). The main function of CBMs is to attach the enzyme to the polymeric surface and thereby increase the local concentration of substrate, leading to more efficient catalysis (4 -6), although some CBMs have been shown to assist hydrolysis directly by twisting polysaccharide strands apart (7,8). Almost all CBMs studied to date contain surface-exposed aromatic rings, which have been shown to be the main sites of interaction with the polysaccharides. These residues form face-to-face hydrophobic stacking interactions in which a tryptophan or tyrosine ring interacts with the nonpolar face of a sugar ring (9 -12). CBMs have been classified into families based on primary structure similarities (2). The largest family is CBM2, which has been further subdivided into Families 2a and 2b, largely on the basis of an 8-residue loop in CBM2a that is absent from CBM2b ( Fig. 1) (2, 3). This loop contains a tryptophan that is conserved among Family 2a members and has been shown to be one of the exposed aromatics that interacts with cellulose in the CBM2a from two xylanases (11,12).
It has become clear that an important difference between CBMs in Families 2a and 2b is that Family 2a proteins all bind to crystalline cellulose (13) even when they actually originate from xylanases, whereas Family 2b modules all bind to xylan (14,15). The structural basis for the specificity of recognition is therefore of considerable interest, especially when one considers that the difference between glucose and xylose is only the presence of a CH 2 OH group attached to C5. It has been sug-gested that the difference in specificity between CBM2a and CBM2b proteins may reside in the absence of the 8-residue loop containing the conserved tryptophan residue (14). However, the recent resolution of the structure of a Family 2b CBM suggested a different reason for the altered binding affinity of Family 2b (16). This CBM, Cf Xyn11A-CBM2b-1, is the internal CBM2b of Cellulomonas fimi xylanase 11A (formerly XylD). In Family 2a, there are three exposed tryptophan residues, which form a flat, planar surface ideally placed to interact with the flat surface of crystalline cellulose (17). However, in Cf Xyn11A-CBM2b-1, the two tryptophans are approximately perpendicular to each other and are separated by 12 Å. They are therefore not well oriented to interact with cellulose, but are ideal for binding xylan via a stacking interaction with pyranose rings i and iϩ2 of the polysaccharide, which has an approximately 120°rotation between one monomer and the next (18). Thus we proposed that the orientation of the tryptophans is responsible for the different ligand specificities of CBM families 2a and 2b. We further speculated that the different orientations of the tryptophan residues may be attributed to a single amino acid residue, which occurs three residues after one of the key tryptophans, Trp-259. These two amino acids are separated by a ␤-turn, and are thus next to one another on the same face of adjacent ␤-strands (Fig. 2). In Family 2a, this residue is a glycine (or occasionally alanine), whereas in Family 2b it is an arginine (Fig. 1). We suggested that the glycine, which sits partially under the Trp ring, allows it to sit flat against the protein surface, whereas the greater bulk of the arginine forces the ring away from the surface and causes it to be rotated by approximately 90°. Here we show that mutation of the arginine of Cf Xyn11A-CBM2b-1 to glycine (R262G) converts the CBM from a xylan to a cellulose binder, with an affinity similar to that found in other two-tryptophan cellulose-binding modules, such as CBM2a derivatives (11,12). Resolution of the structure of the R262G mutant by NMR supports the hypothesis that the change in ligand specificity can be attributed to a reorientation of Trp-259 such that it is in a planar orientation with the other surface tryptophan, Trp-291.

MATERIALS AND METHODS
Protein Expression and Purification-Proteins were amplified using polymerase chain reaction and produced as glutathione S-transferase (GST) fusion proteins (16,19) or cloned into pET16b (Novagen) to produce proteins with His 10 tags. To construct the His 10 -tagged protein, the region of the Cf Xyn11A gene encoding CBM2b-1 (19) was amplified using polymerase chain reaction with primers that contained NdeI and BamHI restriction sites. Protein expression in Escherichia coli BL21 (DE3):pLysS was induced by addition of isopropyl-1-thio-␤-D-galactopyranoside, and proteins were purified by immobilized metal affinity chromatography, as described previously (4). To produce 15 N-labeled protein, E. coli was grown in minimal medium containing ( 15 NH 4 ) 2 SO 4 as sole nitrogen source to mid-exponential phase before inducing expression of the fusion protein with isopropyl-1-thio-␤-D-galactopyranoside for 6 h.
Ligand Binding-Non-denaturing gel electrophoresis was carried out using 0.1% soluble oat spelt xylan (Sigma) as the ligand (16). For qualitative evaluation of cellulose binding capacity, 100 g of protein was incubated with bacterial microcrystalline cellulose (BMCC; 2 mg) or Avicel (10 mg mixing the cellulose was washed once with 0.5 ml of 50 mM sodium phosphate pH 7.0, after which the bound protein was eluted by boiling for 10 min in 50 l of 10% (w/v) SDS, and 20 l was subjected to SDS-PAGE. Binding isotherms were performed on ice in 50 mM sodium phosphate buffer, pH 7.0, at protein concentrations ranging from 1-40 M. BMCC was added to a final concentration of 1 mg/ml, and the reaction was incubated for 1 h. The polysaccharide was centrifuged at 13,000 ϫ g for 1 min, and the A 280 of the supernatant was measured to quantify the amount of protein bound to the insoluble ligand. The data were analyzed as described previously (4) and the N 0 and K d values were determined from the regressed isotherm data. At least three separate binding isotherms were carried out for each protein.
For one-dimensional NMR titrations, spectra at 30°C of a 100 M solution of CBM2b-1-His 10 in 50 mM sodium phosphate buffer, pH 7.0, containing 10% 2 H 2 O, were acquired essentially as described previously (4,16). The binding affinity of the protein was measured by following the shift of Trp-259 and Trp-291 NH ⑀ signals with increasing concentration of ligand. Fitting of the data was performed with EXCEL v5.0 (Microsoft Corporation). For the two-dimensional heteronuclear single quantum coherence (HSQC) titrations, spectra were recorded after each addition of cellohexaose, using 0, 0.5, 2.0, 8.0, 20, 40, and 95 equivalents of cellohexaose. The chemical shift changes were ordered by the weighted total shift (␦ N ϩ 1.6 ϫ ␦ H ).
Structure Determination-NMR spectra were acquired at 500, 600, and 800 MHz on Bruker DRX spectrometers using 5-mm probeheads with z gradients. Assignments and structure restraints were obtained from two-and three-dimensional 15 N-separated NOESY and TOCSY spectra. Additional restraints were obtained from E.COSY, HNHA, and HNHB experiments. In initial rounds of structure calculation, only unambiguous NOEs and dihedral restraints were used. In subsequent rounds, further NOE restraints, including ambiguous restraints (20), were added along with sidechain dihedral restraints, stereoassignments, and hydrogen bonding restraints (based on amide proton exchange and temperature dependence (21)). Structures were calculated by hybrid distance geometry/simulated annealing in XPLOR, as described previously (16). The structures had no distance violations greater than 0.25 Å or dihedral violations greater than 1.0°.

RESULTS AND DISCUSSION
Ligand Specificity of the R262G Mutant-Previously we showed that wild-type Xyn11A-CBM2b-1 binds to oat spelt xylan with a dissociation constant of 0.41 mM, assuming the ligand binding site comprises six consecutive xylose molecules, and to xylohexaose with a dissociation constant of 0.19 mM (16). Data derived from this study showed that the wild-type CBM2b-1 had no measurable affinity for either soluble or insoluble forms of cellulose as judged by NMR titrations using cellohexaose as the soluble ligand, qualitative evaluation of binding using SDS-PAGE to two forms of insoluble cellulose (Avicel (ϳ50% crystalline cellulose, Ref. 3) and BMCC (ϳ76% crystalline cellulose, Ref. 3)) and binding isotherms with BMCC (Fig. 3, A (top), C, and D, respectively). The data for Avicel are not shown but are essentially identical to the BMCC results.
To evaluate the influence of the R262G mutation on the properties of Xyn11A-CBM2b-1, the capacity of the mutant  1, 2, and 3), R262G (lanes 4, 5, and 6) and GST (lanes 7, 8, and 9). The purified proteins (lanes 1, 4, and 7) were incubated with BMCC. Unbound protein (lanes 2, 5, and 8) was removed by centrifugation, and bound material (lanes 3, 6, and 9) was eluted by boiling in SDS. The size markers (L) were low molecular weight markers. D, binding isotherms for the binding of WT (triangles), R262G (inverted triangles), CBM2a W66A (diamonds) and GST (squares) to BMCC. In C and D, the small amount of wild type CBM2b-1 associated with cellulose represents nonspecific binding, as the affinity of the CBM is not significantly different from that of the control GST alone. protein to bind different polysaccharides was assessed. The mutated CBM2b-1, R262G, exhibited no significant affinity for soluble xylan, as judged by non-denaturing gel electrophoresis (Fig. 3B) or for xylohexaose as evaluated by changes in the NMR spectrum of the protein titrated with the oligosaccharide (Fig. 3A, bottom). In contrast, it exhibited significant affinity for cellohexaose as evidenced by the change in chemical shift of the surface tryptophans when the protein was titrated with the oligosaccharide (Fig. 3A, bottom). The dissociation constant K d for cellohexaose was estimated to be 5.0 Ϯ 1.5 mM by NMR. However, this value must be treated with some caution as the titration did not approach saturation, and higher concentrations of cellohexaose could not be used because of the poor solubility of the sugar. Qualitative analysis of the capacity of R262G to bind to insoluble cellulose using SDS-PAGE gels, showed that the majority of the mutant protein bound to cellulose, with only a relatively small amount retained in the unbound fraction (Fig. 3C). Binding isotherms using BMCC as the ligand (Fig. 3D) revealed that R262G bound reasonably tightly, with an estimated K d of 17 M (Table I). The affinity was clearly significantly higher than the nonspecific binding that occurred between BMCC and the wild-type CBM2b or GST alone. These data also indicate that R262G binds considerably more tightly to insoluble cellulose than to cellohexaose. The weaker binding to cellohexaose compared with BMCC is typical of CBMs that interact with crystalline cellulose and has been ascribed to the loss of conformational entropy of the flexible oligosaccharide chain on binding to the protein, which does not occur for the more conformationally constrained polysaccharide chains in crystalline cellulose (22,23). These data therefore indicate that the mechanism of binding of R262G to crystalline cellulose is similar to that of Family 2a proteins.
The affinity of R262G for BMCC is approximately 30 times weaker than that of typical Family 2a CBMs (4, 12, 13). How-  ever, Family 2a CBMs have three exposed tryptophan residues, all of which are involved in stacking interactions with ligand (12,24). Replacement of any one of these tryptophans by alanine typically gives proteins with affinities for BMCC 17-50 times weaker than wild type (11,12). The affinity of R262G was therefore compared with that of the W66A mutant of Pseudomonas fluorescens Xyn10A-CBM2a, which lacks the tryptophan that is present in Family 2a but not in Family 2b proteins (Ref. 11, Trp-66 is in the insertion after ␤-strand 6 in Fig. 1).
The results (Fig. 3D) are given in Table I and show that the mutated CBM2b, R262G, has an affinity for insoluble cellulose that is very similar to that of the equivalent CBM2a mutant (W66A) containing only two surface tryptophans. These data are entirely consistent with the view that residue 262 plays a pivotal role in defining the ligand specificity of Family 2 CBMs. Thus, replacing the bulky sidechain of Arg-262 with an amino acid with no sidechain changes the ligand specificity of the protein from xylan to cellulose.
The Structure of R262G-The structure of the R262G mutant of Cf Xyn11A-CBM2b-1 has been determined using standard two-and three-dimensional NMR methods on 15 N-labeled and -unlabeled protein (Fig. 4). Structural statistics are given in Table II. The restraints set consisted of 1065 internuclear distances (including 26 pairs of hydrogen bond restraints) and 99 dihedral angles covering residues 247-333. The resulting structural ensemble is of high quality, with good precision. In particular, the heavy atom precision is better than 1.0 Å for all residues on the ligand-binding face (Fig. 4, right-facing face). On average, one amino acid per structure was found in the generously allowed region of the Ramachandran map (25). This usually corresponds to Ser-298, which is within an unstructured loop. Asn-292 and Asn-320 both have positive backbone angles.
The conformation of the sidechain of Trp-259 was studied carefully because of its importance. The NMR data demonstrate unambiguously that it adopts a unique conformation in which the ring is parallel to the surface of the protein. Measurement of 3 J ␣␤ and 3 J N␤ coupling constants indicates that the 1 dihedral angle is close to ϩ60°, whereas the large number of NOEs from both edges of the ring to protons on the surface of the protein (for example, Trp-259 HN ⑀ to Thr-316 Me ␥ and Met-318 Me ⑀ , and Trp-259 HC ⑀3 to Ala-263 H ␣ and Glu-258 HN) indicate that the ring is lying flat against the protein surface. Both tryptophan residues implicated in ligand binding are therefore well exposed on the surface of the protein, and their planar orientation is suitable for binding to cellulose. In addition, there are a number of other amino acids well placed to interact with the polysaccharide ligand, as discussed below.
A Structural Comparison of R262G to the Wild Type-Changes in NMR chemical shift between wild type and R262G (Fig. 5A) show that the structure of the protein has been perturbed around the site of mutation and in the adjoining ␤-sheets ␤7 and ␤4. There are no other significant structural changes, as can be seen from the structural overlays in Fig. 5B. NOE, coupling constant, and chemical shift data unambiguously demonstrate that Trp-259 has rotated through approximately 90°, to lie flat against the protein surface. Thus for example in R262G, as noted above, Trp-259 has a number of long range NOE enhancements from both edges of the ring, consistent with the geometry shown in Fig. 4, whereas in the wild type, it only has NOE enhancements involving one face of the ring. The three-bond scalar couplings 3 J ␣␤ and 3 J N␤ show that the sidechain 1 angle has changed from Ϫ60°in the wild FIG. 6. Model for the complex between R262G and cellohexaose based on chemical shift changes seen on addition of cellohexaose to R262G. The residues whose sidechains are shown in green are those undergoing large chemical shift changes (weighted chemical shift change Ͼ 0.20 ppm), namely (in descending order) Asn-264, Asn-292, Glu-257, Gly-262, Trp-259, and Trp-291. The sidechain of Gln-288 is also shown (red). Gln-288 signals shift when WT binds xylohexaose, but not when R262G binds cellohexaose (see "Results and Discussion"). Cellohexaose (blue/red) in standard conformation (29) has been modeled into the binding site, maintaining a planar stacking interaction between sugar and tryptophan rings. type to ϩ60°in R262G. Finally, it is very striking that a number of residues have very different chemical shifts in the two proteins, consistent with the proximity of Trp-259. In particular, in Gly-262 itself, the two C ␣ H protons have chemical shift values 1.21 ppm apart, consistent with the close proximity of an aromatic ring (that of Trp-259) to the higher field of the two protons. In the wild type, the proximity of the Trp-259 ring to Glu-257 shifts its H␣ to 3.74 ppm and its H␤ 3 to 0.94 ppm, values that are very different from their random coil positions. In the mutant, these protons are at 4.29 and 2.24 ppm respectively, very close to their random coil values and in agreement with the Trp-259 ring having moved away from Glu-257. Fig. 5B shows the positions of Trp-259 in wild-type and mutant protein, as well as the position of Arg-262 in the wildtype CBM, from which it can be seen that the initial hypothesis is confirmed. Thus, the replacement of Arg-262 by glycine allows Trp-259 to sit flat against the protein surface, in a conformation almost parallel to that of the other surface tryptophan, Trp-291.
The Binding Site of R262G for Cellohexaose-The binding site of R262G for cellohexaose was determined by NMR spectroscopy. Cellohexaose was titrated into R262G, and changes in chemical shift were followed by two-dimensional 15 N-1 H HSQC spectra, using established methodology (16). Several backbone and sidechain resonances underwent large chemical shift changes on titration, as shown in Fig. 6. These shift changes cluster into a well defined area on the surface of the protein, which includes the tryptophan residues expected to be involved in the binding interaction and can therefore confidently be taken as the binding site. The chemical shift changes also indicate that the orientation of Trp-259 is unaltered on binding.
The chemical shift changes seen on addition of cellohexaose to R262G are almost identical to those seen on titration of xylohexaose into the wild-type protein (16), indicating that the binding interactions are similar in the two cases with the obvious exception of the orientation of Trp-259. This includes marked changes to Asn-292 and Asn-264 sidechains, suggesting that they are involved in hydrogen bonding interactions in both cases. There are no new residues with large chemical shift changes, as expected from our hypothesis that binding specificity arises largely from aromatic stacking interactions and not from hydrogen bonds. The most striking difference in chemical shift changes is in the sidechain amide group of Gln-288. These signals shift markedly in the wild type/xylohexaose interaction (2.63 ppm in 15 N shift, the largest 15 N chemical shift change on binding by a factor of more than 2), but change by less than 0.02 ppm in the R262G/cellohexaose titration. From inspection of the models of the complexes (Fig. 6 and Fig. 7 of Ref. 16), it is apparent that Gln-288 can hydrogen bond to a sugar hydroxyl in the complex with xylohexaose because the twisted structure of xylohexaose orients the sugar ring in the direction of Gln-288 but that the distance from Gln-288 to the sugar is too large in the cellohexaose complex to permit hydrogen bonding. The model of the complex shown in Fig. 6 therefore provides a ready explanation of the NMR results.
Conclusions-The ligand specificity of CBM2 has been shown to be determined largely by a single amino acid, which controls the orientation of one of the tryptophan residues that interacts with the saccharide ligand. When the tryptophans are coplanar, the CBM recognizes the planar chains of cellulose, whereas when they are twisted into a near perpendicular arrangement, the protein recognizes the helical structure of xylan. Thus, in this family of CBMs, ligand specificity is determined largely by recognition of the three-dimensional shape of the polysaccharide ligand, rather than by specific hydrogen bonding patterns, as is typically seen in proteins that recognize monosaccharides (26,27).