Circular Permutation Provides an Evolutionary Link between Two Families of Calcium-dependent Carbohydrate Binding Modules*

The microbial deconstruction of the plant cell wall is a critical biological process, which also provides important substrates for environmentally sustainable industries. Enzymes that hydrolyze the plant cell wall generally contain non-catalytic carbohydrate binding modules (CBMs) that contribute to plant cell wall degradation. Here we report the biochemical properties and crystal structure of a family of CBMs (CBM60) that are located in xylanases. Uniquely, the proteins display broad ligand specificity, targeting xylans, galactans, and cellulose. Some of the CBM60s display enhanced affinity for their ligands through avidity effects mediated by protein dimerization. The crystal structure of vCBM60, displays a β-sandwich with the ligand binding site comprising a broad cleft formed by the loops connecting the two β-sheets. Ligand recognition at site 1 is, exclusively, through hydrophobic interactions, whereas binding at site 2 is conferred by polar interactions between a protein-bound calcium and the O2 and O3 of the sugar. The observation, that ligand recognition at site 2 requires only a β-linked sugar that contains equatorial hydroxyls at C2 and C3, explains the broad ligand specificity displayed by vCBM60. The ligand-binding apparatus of vCBM60 displays remarkable structural conservation with a family 36 CBM (CBM36); however, the residues that contribute to carbohydrate recognition are derived from different regions of the two proteins. Three-dimensional structure-based sequence alignments reveal that CBM36 and CBM60 are related by circular permutation. The biological and evolutionary significance of the mechanism of ligand recognition displayed by family 60 CBMs is discussed.

Plant cell walls are complex macromolecular structures that consist of a diverse repertoire of interlocking polysaccharides (1). The microbial deconstruction of these composite structures, which is mediated by an extensive repertoire of hydrolytic enzymes, is of considerable biological and industrial significance. The release of sugars from the plant cell wall is not only required to maintain microbial ecosystems, but the volatile fatty acids generated by these microbiota are essential nutrients for higher order organisms such as mammalian herbivores (2). Within an industrial context, the microbial enzymes that catalyze this process are integral to the exploitation of lignocellulose as an environmentally sustainable substrate for the biofuel and bioprocessing industries (3,4).
The physical complexity of plant cell walls limits the access of the hydrolytic enzymes to their target substrates. To overcome the "access problem" glycoside hydrolases, esterases, and lyases that degrade plant structural polysaccharides, in general, have a modular structure in which the catalytic module is appended to non-catalytic carbohydrate binding modules (CBMs 4 ; see Ref. 5 for review), which are grouped into sequence-based families within the CAZy database (6). The general function of CBMs is to direct the cognate catalytic modules to their target substrate within the plant cell wall, thereby increasing the efficiency of catalysis (7)(8)(9). CBMs, in addition to their family assignment, have also been defined as type A, B, and C modules reflecting their mode of binding (5). Type A CBMs bind to crystalline surfaces such as cellulose and chitin, type B modules recognize the internal regions of single glycan chains, whereas type C proteins typically recognize no more than two sugars and often target the end of glycan chains (5). The majority of CBMs display a ␤-sandwich fold with the ligand binding site located in either the concave surface presented by one of the ␤-sheets (a topography that facilitates the targeting of the internal regions of glycan chains (10 -16)) or in the loops that connect the two sheets (17)(18)(19)(20)(21). This latter binding site can either target the end (17,19,21) or, less frequently, the internal regions of glycan chains (18,20).
Xylan, the major hemicellulose component of most plant cell walls, is one of the most complex carbohydrates targeted by CBMs. The polymer consists of a ␤-1,4-linked xylopyranose backbone that can be decorated at O2 with 4-O-methyl-␣-Dglucuronic acid, or ␣-D-glucuronic acid, and at O2 and/or O3 with acetate and ␣-L-arabinofuranose moieties, which can also be linked, through O5, to ferulic acid (1). The extent and nature of these decorations are species-, tissue-, and differentiationspecific (22). CBMs that target the xylan backbone in vitro belong to families 2 (2b subfamily), 4, 6, 15, 22, and 36 (13, 15, 16, 18, 20). The deep clefts presented by CBM4, 6, and 22, however, restrict the capacity of these modules to bind to xylans within plant cell walls, whereas the more open ligand binding sites of CBM2b and 15 enable these proteins to recognize their target ligands in planta (23). The bacterium Cellvibrio japonicus expresses an extensive xylan-degrading system comprising four glycoside hydrolase family (GH)10 and two GH11 endoxylanases, a GH5 enzyme predicted to be a glucuronoxylanspecific xylanase, a GH51 general-acting, and a GH62 xylanspecific, arabinofuranosidase, a GH67 ␣-glucuronidase and numerous CE1, CE2, and CE4 xylan acetyl esterases, one of which is appended to a GH11 xylanase (24 -31). All the fully secreted xylan-degrading enzymes (not those appended to the outer membrane) contain at least one cellulose-binding CBM and an additional non-catalytic module of unknown function (NCM). Three of the NCMs were recently shown to be CBM35s that target both uronic acids that decorate xylans from rapidly growing cells, and a product released by the cleavage of pectin by pectate lyases (21). Intriguingly, none of these secreted enzymes appear to contain CBMs that target the xylan backbone, although the two GH11 xylanases, CjXyn11A and CjXyn11B, contain highly related NCMs (26,30) (see Fig. 1).
To test the hypothesis that the NCMs in the two GH11 xylanases comprise novel CBMs that target xylan, the biochemical properties of the module from CjXyn11A were explored. The data show that the NCM is indeed a CBM that targets xylan but is also able to bind cellulose and galactan. Increased affinity for its ligands was conferred by avidity effects caused by protein dimerization through an inter-chain disulfide bond. The threedimensional structure of a close homolog of the CjXyn11A xylan-binding CBM shows that ligand recognition is primarily conferred through the polar interactions of O2 and O3 of a single sugar with a protein-bound calcium ion. Furthermore, we show that CBM60 modules are evolutionarily related to CBM36 domains through circular permutation in the ␤-barrel folds. The functional and evolutionary significance for the mechanism of ligand recognition displayed by this CBM is discussed.
Cloning, Expression, and Purification of CjCBM60A and vCBM60-The ORF encoding CjCBM60A (UniProtKB accession no. B3PIN1) and vCBM60 (EMBL accession no. FN908918) were amplified from C. japonicus genome DNA and plasmid pBD7340, respectively, by PCR using forward and reverse primers listed in supplemental Table 1S. The amplified DNA derived from the C. japonicus genome and pBD7340, were cloned into NdeI/XhoI-restricted pET16b (generates pCJCBM60A) and pET22a (generates pVCBM60), respectively. CjCBM60A and vCBM60 contain an N-terminal His 8 tag and a C-terminal His 6 tag, respectively. pVCBM60-VCBM60 encodes two copies of vCBM60, separated by the linker sequence KLSVSSSSSVQSSSSSSEF, and contains a C-terminal His 10 tag. To generate this plasmid a polylinker sequence (encompassing the restriction sites NdeI/BamHI/KpnI/HindIII/ EcoRI/SacI/SalI/XhoI) was inserted into the NdeI and XhoI sites of pET22b (Novagen) to generate pFV1. The DNA sequence encoding the serine-rich linker sequence was cloned between the HindIII and EcoRI sites of pFV1 and PCR products encoding the two copies of vCBM60 were inserted, respectively, between the NdeI and HindIII and between the EcoRI and XhoI sites. Escherichia coli BL21 (DE3) (Novagen) cells, harboring pCJCBM60A, were cultured in LB broth containing ampicillin (50 g/ml) at 30°C to mid-exponential phase (A 600 of 0.6), at which point isopropyl ␤-D-thiogalactopyranoside was added to a final concentration of 1 mM, and the cultures were incubated for a further 4 h. To produce vCBM60 and vCBM60-vCBM60, E. coli Origami B:pLysS cells containing pVCBM60 were cultured to mid-exponential phase at 37°C, and recombinant gene expression was induced by the addition of isopropyl ␤-D-thiogalactopyranoside to 0.5 mM and incubation at 30°C for 4 h. The cells were harvested by centrifugation, and His-tagged recombinant protein was purified from cell-free extracts by immobilized metal ion affinity chromatography using a cobaltbased Talon (Clontech) column deploying standard methodology (36). For biochemical studies vCBM60 was further purified by anion-exchange chromatography using a Q12 anion-exchange column (Bio-Rad) and a 0 -500 mM NaCl gradient in 10 mM Tris-HCl buffer, pH 8.0. For crystallographic studies vCBM60 was purified further by size-exclusion chromatography using an Amersham Biosciences XK16 HiLoad TM 16/60 Superdex TM Prep grade gel-filtration column. The same protocol was used to produce selenomethionine-containing vCBM60, except that the gene was expressed in E. coli B834 (Novagen) using growth conditions as described before (13). No reducing agent was included in the buffers used to purify the protein. To generate vCBM60nt, which contains no His tag (for crystallization with ligands), a stop codon was inserted in place of the Leu in the C-terminal motif LEHHHHHH (encoded by pET22a). After producing the protein in Origami B, as described above, 12.6 ml of the resultant cell-free extract was mixed with 1 g of finely sonicated insoluble oat spelt xylan and incubated overnight at 4°C overnight. After washing the polysaccharide three times with 10 mM Tris-HCl buffer, pH 8.0, it was mixed with 5 ml of 100% (v/v) ethylene glycol for 1 h at 4°C to elute bound protein. The xylan was removed by centrifugation at 13,000 ϫ g for 3 min. The purified protein was then dialyzed against 3 ϫ 1,000 vol of 10 mM Tris-HCl buffer, pH 8.0, prior to anion-exchange chromatography and subsequent sizeexclusion chromatography. All the purified proteins were electrophoretically pure as judged by SDS-PAGE.
Site-directed Mutagenesis-Site-directed mutagenesis was carried out employing a PCR-based QuikChange site-directed mutagenesis kit (Stratagene) according to the manufacturer's instructions, using pVCBM60 as the template and primers presented in supplemental Table 1S.
Ligand Binding Studies-The capacity of the target proteins to bind to a variety of soluble plant structural polysaccharides was evaluated by affinity gel electrophoresis. Continuous native polyacrylamide gels were prepared consisting of 7.5% (w/v) acrylamide in 25 mM Tris/250 mM glycine buffer, pH 8.3. To one of the gels, 0.1% polysaccharide was added prior to polymerization. Approximately 5 g of target proteins and BSA (as a non-interacting negative control) was loaded onto the gels and subjected to electrophoresis at 10 mA/gel for 2 h at room temperature. Proteins were visualized by staining with Coomassie Blue. Isothermal titration calorimetry (ITC) was carried out at 25°C using a MicroCal VP-ITC titration calorimeter. Titrations were carried out in 50 mM Na-Hepes buffer, pH 7.5 containing 5 mM CaCl 2 , unless otherwise stated. During a titration, the protein sample (50 -150 M), stirred at 300 rpm in a 1.36-ml reaction cell, was injected with 25 successive 10-l aliquots of a 0.5-5 mg/ml polysaccharide or 5-10 mM oligosaccharide at 200-s intervals. To investigate calcium binding the protein was treated with Chelex-100 to remove any bound divalent metal, prior to titration with 0.5-3 mM CaCl 2 . Raw binding data were corrected for heat of dilution of both protein and ligand. Integrated heat effects were analyzed by non-linear regression using a single-site binding model (MicroCal ORI-GIN, v7.0), yielding values for K A and ⌬H°. Other thermodynamic parameters were calculated using the standard thermodynamic equation ϪRT ln K A ϭ ⌬G°ϭ ⌬H°Ϫ T⌬S°. The binding of proteins to tobacco stem sections was determined as described previously using immunohistochemical methods (37). Binding was visualized by fluorescence microscopy.
Crystallization and Structure Solution-Crystals of native vCBM60 were grown at 4°C or 20°C in 2.1 M sodium malate pH 5.5-6.0 at a 1:1 or 3:2 ratio of protein to mother liquor. Selenomethionine-containing protein was crystallized under the same conditions but did not crystallize unless the reducing agents ␤-mercaptoethanol or DTT, normally included in selenomethionine preparations, were explicitly left out of the buffers during protein preparation and crystallization. Single crystals of all proteins were mounted, using Paratone oil as the cryoprotectant, and flash frozen into liquid N 2 .
The above crystals were mounted on beamline ID23-1 at the European Synchrotron Radiation Facility (ESRF), Grenoble. Data were collected to 1.6 Å using an Area Dection Systems Corporation Quantum 315 charge-coupled device detector at a wavelength of 1.0716 Å. A total of 360 images of 0.5°each were collected. The program MOSFLM (CCP4 suite (38)) was used to index and integrate the diffraction images. The resulting data were scaled and merged using SCALA (also CCP4 suite). The space group was either P4 1 2 1 2 or P4 3 2 1 2 with cell dimensions a ϭ b ϭ 45.4 Å, c ϭ 102.3 Å and one molecule per asymmetric unit. Data statistics are given in Table 1. Data for the selenomethionine form of vCBM60 were collected on beamline ID23-1 at the ESRF, Grenoble, at a wavelength of 0.9796 Å. A total of 180 images of 0.5°each were collected and processed using HKL2000 (39). Selenium positions were determined using SHELXD (recently reviewed in Ref. 40), and initial phase calculations and solvent modification in SHELXE (40) indicated the space group to be P4 1 2 1 2. The resulting phases and the native data were used in model building using the CCP4 implementation of REFMAC (41)/ARP-wARP. Minor manual corrections were performed using the program COOT (42) with maximum likelihood refinement using REFMAC (41). Final data and structure quality statistics for both native and selenomethionine forms of vCBM60 are given in Table 1. Structural figures were prepared with PyMOL. 5 Crystals of vCBM60 in complex with cellotriose and galactobiose were obtained by preparing protein:ligand mixtures at 20 mg/ml vCBM60: 10 mM ligand and mixing 1:1 with 2.4 M sodium malonate or 0.01 M zinc chloride, 0.1 M Tris/HCl, pH 8.0, 20% (w/v) PEG 6000, respectively, and grown at 20°C. Crystals were mounted in mother liquor supplemented with 25% glycerol and 10 mM appropriate ligand before being cryocooled in liquid N 2 . Data for these crystals were collected on Diamond IO3 beamline to 1.2 Å for vCBM60-cellotriose and 1.8 Å for vCBM60-galactobiose. Data for the liganded complexes of vCBM60 were integrated and scaled using MOSFLM and SCALA. The structures of vCBM60-cellotriose and vCBM60galactobiose were determined by molecular replacement in the CCP4 version of MOLREP using the unliganded vCBM60 as the search model. The starting models for vCBM60-cellotriose and vCBM60-galactobiose were refined by rounds of manual rebuilding in COOT (42) interspersed with restrained refinement in REFMAC (41). Solvent water molecules were added using COOT (42) and checked manually.

RESULTS
Biochemical Properties of CjCBM60A-The two C. japonicus GH11 xylanases, CjXyn11A and CjXyn11B, contain a highly similar 116 residue module of unknown function (26,30). To explore their functional significance, the CjXyn11A module was expressed in E. coli (attempts to produce the corresponding CjXyn11B module in soluble form were unsuccessful) and, after purifying the protein to electrophoretic homogeneity by IMAC, its biochemical properties were investigated. The module displayed no hydrolytic activity against plant cell wall polysaccharides, including xylans, galactans, mannans, and ␤-glucans (data not shown). Affinity gel electrophoresis, however, showed that the protein bound to a range of xylans, carboxymethylcellulose, hydroxymethylcellulose, ␤-1,3;␤-1,4 mixed-linked ␤-glucans (barley ␤-glucan and lichenan) and ␤-1,4-galactan (example data in Fig. 1, full data set in Table 2) but did not bind to galactomannan, ␤-1,3-glucan (laminarin), or a range of ␣-linked polysaccharides. It is apparent, therefore, that the protein module is a CBM that displays particularly broad ligand specificity and, henceforth, is designated as CjCBM60A, being the founding member of this new family.
ITC was used to quantify the affinity of CjCBM60A for polysaccharide and oligosaccharide ligands. Examples of the titrations are displayed in Fig. 2, whereas the complete data set are reported in Table 3. The data show that CjCBM60A binds tightly to a range of ␤-1,4-galactans, xylans, and ␤-glucans (␤-1,4-glucan backbone, xyloglucan; mixed-linked ␤-1,3-␤-1,4-glucans, barley, and lichenan). The stoichiometry of binding, assuming a single binding site for each CBM protomer, indicated that, at saturation, each protein molecule occupied ϳ4 -6 tandomly arrayed sugar residues for all polysaccharides. In general the affinity and the coverage, irrespective of the number and type of side chains decorating the backbone, were similar. Thus the ligand binding site of CjCBM60A appears to be able to interact with xylans, glucans, and galactans that are extensively decorated. Typical of carbohydrate-protein recognition (7, 10, 13, 16, 19 -21), binding to polysaccharides and oligosaccharides was driven by favorable changes in enthalpy, with generally an unfavorable entropic contribution. It should be noted, however, that the binding of CjCBM60A to glucans is associated with a positive change in entropy, which therefore makes a favorable contribution to affinity ( Table 3).
The affinity (K A ) of CjCBM60A for xylohexaose, which appears to fully occupy the ligand binding site, was 2.7 ϫ 10 4 M Ϫ1 (data not shown), which was 20-to 30-fold lower than the K A values obtained for the various xylans ( Table 3). The targeting of multivalent polysaccharides, in preference to oligosaccharides, is likely to be mediated through avidity effects (43,44), which requires that CjCBM60A is capable of oligomerization. This view is supported by size-exclusion studies, which showed that the protein migrated as, predominantly, a dimeric species (data not shown), the biological significance of which is discussed below.    Fig. 1, and the full data set is in Tables  2 and 3) is very similar to CjCBM60A. The protein module binds to galactan, xylans, and ␤-glucans (that contain ␤-1,4 FIGURE 2. Representative ITC data of the CBM60s titrated with carbohydrates and calcium. The ligand (0.5-5 mg/ml for the polysaccharide and 3 mM for calcium) in the syringe was titrated into the CBM60 (50 -100 M) in the cell. The galactan was derived from potato, and WAX signifies wheat arabinoxylan. The top half of each titration shows the raw injection heats; the bottom half, the integrated peak areas fitted using a single site model (MicroCal Origin v7.0). ITC was carried out as described under "Experimental Procedures" in 50 mM Na/Hepes, pH 7.5, containing 5 mM CaCl 2 at 25°C for the carbohydrate ligands. When titrating with calcium the proteins were treated with Chelex-100 to generate the apo forms and calcium was omitted from the Hepes buffer. linkages), although the actual affinities for these polysaccharides are considerably lower than observed with CjCBM60A. This likely reflects the inability of vCBM60 to oligomerize (data not shown), precluding any benefits derived from avidity effects. Indeed, this view is supported by the observation that the artificial construct, containing two tandem copies of vCBM60 (designated vCBM60-vCBM60), binds to xylan and galactan, respectively, 20-and 40-fold more tightly than the monomeric form of the protein ( Table 4). The affinity of vCBM60 for a series of oligosaccharides (Table 3) shows that the protein exhibits similar affinity for the hexasaccharide and trisaccharide of cello and xylooligosaccharides (but not xylobiose or cellobiose), displaying a preference for the xylo-configured ligands, whereas the K A values for galactobiose and galactotriose were very similar. It would appear, therefore, that the ligand binding site is optimized to bind trisaccharides of glycans that have a 2-and 3-fold screw axis (␤-1,4-containing glucans and xylans), but to disaccharides of glycans that adopt an extended helical conformation, such as ␤-1,4-galactan. These data suggest that the broad specificity displayed by vCBM60 and CjCBM60A may be a general feature of this family, partic-ularly as the ligand binding residues (discussed in detail below) are highly conserved.
CBM60 Displays Absolute Requirement for Calcium-To investigate whether ligand recognition was metal-dependent,   (Table  5). These data indicate that CjCBM60A and vCBM60 are calcium-dependent CBMs.
In Vitro Labeling-The capacity of the tandem vCBM60-vCBM60 protein to bind to ligands in intact cell walls of tobacco stem sections was assessed using immunohistochemistry (Fig.  3). This analysis indicated that the protein bound extensively to secondary cell walls of the xylem elements and phloem fibers. Although only weak recognition of the primary cell walls was observed in cortical parenchyma, pith parenchyma and epidermal tissues (Fig. 3), this likely reflects the thinness of the walls. The thickened vascular secondary walls contain a large amount of xylan, as well as cellulose. In primary cell walls the cellulose microfibrils are embedded in a matrix consisting mainly of pectins that include galactans, albeit at low concentrations (45). The binding profile of vCBM60-vCBM60 to tobacco stems is broadly similar to other xylan-specific CBMs (such as CBM2b-1-2 (23)), although CBM2b-1-2 did not display any binding to primary cell walls. The lack of cellulose recognition (CBMs that bind crystalline cellulose bind extensively to both secondary and primary cell walls (46)) likely reflects the extensive interand intra-chain hydrogen bonds between the hydroxyl groups within the microfibrils, which are therefore unable to make polar contacts with vCBM60. The weak binding to primary cells also indicates that either galactan is present at low concentrations in these walls or is not accessible to vCBM60. Thus, although vCBM60 appears to display broad ligand specificity in vitro, in an ex vivo setting this module targets mainly cell walls rich in xylan, consistent with the location of family 60 CBMs predominantly in xylanases (see below).
Crystal Structure of vCBM60-The crystal structure of vCBM60 was solved using the single-wavelength anomalous dispersion method employing selenomethionine-labeled protein and refined using data extending to 1.6-Å resolution ( Table 1). The vCBM60 crystals belong to a space group of P4 1 2 1 2 with cell dimensions a ϭ b ϭ 45.4 Å, c ϭ 102.3 Å and one molecule per asymmetric unit. Overall, the electron density is of high quality throughout the entire length of the protein, and the final model contains 110 amino acid residues, 175 water molecules, and 2 ions that, based on their coordination sphere, ligands, and B-factors have been modeled as Ca 2ϩ . The structure generally has backbone torsion angles in the most favored region (90.6% of the bonds), and none in the disallowed regions of the Ramachandran plot, according to PROCHECK (47).  Oat-spelt xylan 400 Ϯ 40 CjCBM60A ϩ 10 mM EDTA Oat-spelt xylan n.b. CjCBM60A (apo-form) b Oat-spelt xylan n.b. CjCBM60A (apo-form) ϩ 5 mM Ca 2ϩc Oat-spelt xylan 312 Ϯ 42 CjCBM60A (apo-form) Calcium 173 Ϯ 18 a n.b., no binding. b CBM60 (apo form): the apo form of the CBM60s were prepared by treatment with Chelex-100 to remove bound divalent metal ion. Titration was performed in 50 mM Na Hepes, pH 8.0, treated with Chelex-100, with 0.5 or 5 mg/ml oat-spelt xylan. c CBM60 (apo form): titration of the apo-form of the CBM60s with xylan was carried out in 50 mM Na Hepes, pH 7.5, containing 5 mM CaCl 2 . d For calcium, the concentration of calcium in the syringe was 0.5-3 mM.
The Ligand Binding Site in vCBM60-The ligand binding sites in CBMs that display a ␤-sandwich fold comprise, typically, the concave surface presented by one of the ␤-sheets, or at the end of the elliptical protein, within the loops connecting these two structural elements (see Ref. 5 for review). Inspection of the concave surface of vCBM60 does not reveal a cleft-like structure that would accommodate polysaccharide ligands. By contrast, the loops connecting the two ␤-sheets present a very broad but short cleft (Fig. 4). The floor and one wall of the cleft are formed by the loop connecting ␤-7 with ␤-8, whereas the other wall comprises the loop linking ␤-5 with ␤-6. Confirmation that the loop-derived cleft consists of the ligand binding site is derived from mutagenesis studies. Substitution of most of the residues on the surface of this cleft with alanine either completely abrogates, or greatly reduces, the affinity of the protein for its carbohydrate ligands (Table 4). Indeed, the observation that these mutations cause a similar reduction in affinity for the gluco-, xylo-, and galacto-configured ligands demonstrates that the protein interacts with these polymers at a common binding site.
The crystal structure of vCBM60 in complex with cellotriose and galactobiose was solved to resolutions of 1.2 and 1.8 Å, respectively. Electron density for both oligosaccharides were evident, and thus the mechanism for the broad ligand specificity displayed by vCBM60 could be explored (Fig. 5). With respect to galactobiose the disaccharide is twisted into a helical type structure consistent with the conformation adopted by ␤-1,4-Gal polymers bound to GH53 galactanases (48). The reducing sugar, Gal1, makes parallel hydrophobic interactions with Trp 85 , mainly with the pyrrole component of the aromatic ring system, but does not make any direct polar contacts with the protein. Whereas Gal2 also makes hydrophobic contacts with Trp 85 , primarily the benzene component of the indole ring, the O2 and O3 of the sugar make extensive interactions with Ca2, and with the aspartates that form coordinate bonds with the divalent metal ion (Fig. 5). Thus, Ca2 interacts with the ligand binding site by making coordinate bonds with Arg 59 O, His 100 O, Asp 55 O␦1, and Asp 60 O␦2, while its octahedral coordination is completed through interactions with O2 and O3 of FIGURE 4. Crystal structure of vCBM60. A, a protein schematic of vCBM60 in complex with cellobiose (green) color ramped from N terminus (blue) to C terminus (red). Calcium ions are shown as spheres shaded as blue slate, and cellobiose is in stick representation. The disulfide bond stabilizing the loop connecting ␤-7 and ␤-8 is shown as magenta sticks. B and C, the solvent-accessible surface of vCBM60 in complex with cellobiose and galactobiose, respectively. Bound ligand is shown in green (carbon) stick representation. Amino acids whose side chains contribute to ligand recognition are colored magenta, and the ligand-binding calcium is shown as a slate blue sphere. The figure, and other structure figures, were drawn with PyMOL. 5 Gal2 and a water molecule. In addition to interacting with Ca2, O2 and O3 of Gal 2 make hydrogen bonds with Asp 60 O␦2 and Asp 55 O␦2, respectively, and also interact with the backbone carbonyl of His 100 . It is also possible that O2 and O3 make indirect contact with the CBM through the water that interacts with Ca2.
The structure of the vCBM60-cellotriose complex (Figs. 4 and 5), reveals only two of the three Glc residues. The disaccharide adopts a 2-fold screw axis in which the two Glc residues (defined as cellobiose) are orientated at 180°with respect to each other (galactobiose adopts a more helical conformation). The interactions between vCBM60 and cellobiose are very similar to the vCBM60-galactobiose complex. Trp 85 makes planar hydrophobic contacts with Glc1 and Glc2, however, the poor electron density displayed by Glc1 indicates that these apolar interactions are weak. The O2 and O3 of Glc2 make polar contacts with Ca2, the side chains of Asp 55 and Asp 60 , the backbone carbonyl of His 100 and indirect interactions through the water that coordinates with Ca2. In addition, the equatorial O4 appears to make a polar contact with Asp 55 O␦2, which is not mirrored by the axial O4 of Gal2. We were unable to obtain a complex of vCBM60 with xylo-oligosaccharides. However, as there are no interactions with O6 of either galactobiose or cellobiose, it is highly likely that xylo-configured ligands will make similar, if not identical, interactions with the protein to cellobiose. It should be noted, however, that maximal binding to galactan is achieved through interactions with only two sugars, whereas the protein binds to tri-but not disaccharides of cellulose and xylan. The reason(s) for this subtle difference in specificity are not entirely clear. It is possible that the weak hydrophobic interaction between Glc1 (and by inference Xyl1) with Trp 85 (compared with Gal1, which makes more extensive apolar contacts with the tryptophan) is maximized by presenting a closed pyranose ring to the protein, which occurs when Glc1 is the central sugar in a trisaccharide. Indeed, previous studies have shown that oligosaccharides that extend beyond the ligand binding site of CBMs display higher affinity than oligosaccharides that have a degree of polymerization that matches the number of available sugar binding sites (5).
Mutagenesis studies described above (Table 4), are entirely consistent with the structure of the complexes; there is a complete loss in binding in protein variants where Asp 55 , Asp 60 , or Trp 85 are replaced with alanine. Furthermore the interaction of only the backbone carbonyl of His 100 with calcium explains the lack of a significant effect on ligand binding when this residue is substituted with alanine (Table 4). Interestingly, Trp 85 is the only amino acid that interacts with Glc/Gal1, comprising the second sugar binding site. Support for the importance of this second binding site in driving ligand recognition is provided by the observation that Gal, Glc and Xyl, or their ␤-methyl glycosides, do not bind to vCBM60 (data not shown).Generally, the mutagenesis data are consistent with the view that xylo-configured ligands binds through a similar mechanism to either galactobiose or cellotriose.
The role of calcium in the function of CBMs is restricted to those modules that display a ␤-sandwich fold, where the ligand binding site comprises the loops that link the two ␤-sheets (49). The metal ion, currently, only contributes to carbohydrate recognition in those CBMs that interact with one or, at most, two sugars, but not in those modules that bind to more extensive ligands, suggesting that calcium can mediate particularly tight binding to sugars. This feature of metal ion requirement is well illustrated in CBMs that bind to the internal regions of polysaccharides. CBMs that bind to four or more sugars in these polymers do not utilize calcium in ligand recognition, however, in modules, such as CBM36 (20) and CBM60, which interact with only two or three internal sugars, the metal ion plays a dominant role in carbohydrate binding; consistent with the view that charged dipole-dipole interactions are stronger than uncharged ones (50).
The structural alignment program DaliLite (available online) revealed that the closest, functionally relevant, structural homolog of vCBM60 is CBM36 from Paenibacillus polymyxa xylanase 43A (PDB 1UX7 (20)), with a z score of 6.2, root mean square deviation of 1.8 Å over 68 aligned residues out of a possible 120 amino acids, and a total sequence identity of 17%. Similar to vCBM60, CBM36 contains a short broad cleft formed by the loops connecting the two ␤-sheets. The ligand binding apparatus is highly conserved in the two proteins (Fig. 6), however, in CBM36, xylobiose, the only ligand for which a structure is available, is orientated 180°with respect to the ligands in vCBM60, although the xylose disaccharide could be similarly "flipped" in the family 60 CBM. In CBM36 Xyl2 (non-reducing sugar of xylobiose) makes hydrophobic contacts with two tyrosine residues, while Xyl1, through O2 and O3, forms polar con- tacts with calcium and the two aspartates and the backbone carbonyls that coordinate the metal ion (Fig. 6). vCBM60, but not CBM36, binds to ␤-1,4-containing glucans, however, the structural basis for this difference in specificity is very subtle. Xyl2 sits deeper in the cleft in CBM36 than Gal1 or Glc1 in vCBM60, and, it is likely, that the C6-OH of Glc2 in the Paenibacillus protein module will clash with Tyr 40 . Given the structural conservation it seems likely that, like vCBM60, CBM36 will bind to galactan, although this ligand was not evaluated by Jamal-Talabani and colleagues (20).
Despite the structural conservation of the ligand binding apparatus, remarkably, the location of the carbohydrate/metal binding amino acids in the two protein sequences are not equivalent (Fig. 6). Thus, the aspartates (Asp 116 and Asp 121 ) and Trp 120 , which coordinate the ligand-binding calcium in CBM36, are at the C-terminal region of the protein, in the loop connecting ␤-8 and ␤-9, whereas the aromatic residue, Tyr 26 , which dominates the second sugar binding site, is derived from the loop joining ␤-2 and ␤-3. In sharp contrast, the constellation of amino acids that interact with the critical ligand binding calcium in vCBM60 are located in the central region of the protein, in the loop connecting ␤-5 and ␤-6. Trp 85 , which comprises the second sugar binding site, is in the C-terminal region of the protein in the loop joining ␤-7 and ␤-8.
CBM60 and CBM36 Are Related by Circular Permutation-The observation of essentially conserved metal binding sites, derived from sequence-independent motifs, initially led us to consider the convergent evolution of a metal ion-dependent ligand binding function linking CBM36 and CBM60. Close analysis of the overlaid three-dimensional structures, however, revealed a much higher degree of structure-based sequence identity than evident from the linear primary sequence alignments. Subsequent structural similarity searches using SSM (51) of CBM36 and vCBM60 (in which individual chains are compared but the requirement for connectivity is turned off), revealed that residues 1-42 of CBM36 display 26% identity with amino acids 70 -111 of vCBM60. Similarly residues 43-120 of CBM36 exhibit 36% identity with amino acids 1-69 of vCBM60 (Fig. 7). Furthermore, there is perfect alignment of the amino acids that plays a key role in ligand recognition by coordinating the calcium or in making hydrophobic interactions with the second sugar. The structural and sequence similarity of non-contiguous regions of these two CBM families proteins is strongly indicative of "circular permutation." In circular permutation genetic (or post-translational processing in the case of concanavalin A (52)) events lead to the ligation of the N and C termini of the protein and subsequent cleavage at another site generating new N and C termini (53). There are many natural examples of circular permutation, including some carbohydrate-active proteins and enzymes, transaldolases, glutathione synthetases, methyltransferases, and proteinase inhibitors (see Ref. 54 for review). In general proteins that undergo circular permutation display either a ␤-barrel or ␤-sandwich fold and require the N and C termini to be in close proximity. Indeed, in virtually all CBMs that display a ␤-sandwich fold the two termini are in close proximity (5), suggesting that these proteins are candidates for circular permutation events.
The evolutionary rationale for the circular permutation of CBM36 (or an ancestral sequence of this family) leading to the generation of family 60 CBMs (or vice versa; see "Discussion") is not readily apparent. It is remarkable that the reorganization of the ␤-sandwich fold did not disrupt the topology of the ligand binding site. Interestingly, similar results have been obtained through synthetic circular permutations, although the stability of these proteins was compromised (55). Intuitively, such events, which retain overall three-dimensional structure, may introduce subtle structural changes that alter the function of the protein. Indeed, several CBM60s have acquired a C-terminal extension that likely mediates oligomerization and hence increased affinity. It is possible that, in some way, the genetic events that lead to circular permutation also generate these oligomerization sequences.
CBM Family 60-Analysis of sequence-based relatives of vCBM60, using BLAST, revealed 15 proteins that contain a module with significant similarity to vCBM60 (including CjCBM60A and -B). All of the proteins were from bacteria, and all but one of the proteins contained xylanase catalytic modules from either GH11 (12 of 15) or GH10 (2 of 15), suggesting that the primary target for this CBM is xylan. Several proteins also contained a carbohydrate esterase from either family 4 (CE4, 5 of 15) or family 6 (CE6, 1 of 15), with both families known to contain xylan-specific esterases. In addition to the CBM60 module, 7 of 15 proteins contain a CBM10, a CBM family that targets crystalline cellulose (56), and 2 contain a CBM5 (one of these also has a CBM10), a family that targets chitin and cellulose. One protein, B3PKS4 from C. japonicus, has neither xylanase nor esterase catalytic modules, although, in addition to the CBM60, it does contain a CBM10 and a module of ϳ400 residues. It is possible that this large module displays a novel catalytic function related to xylan deconstruction. The residues that coordinate Ca2 in vCBM60 (Asp 55 , Arg59, Asp 60 , and His 100 ) are invariant in the other CBM60 members, whereas Trp 85 , which comprises the second sugar binding site, is replaced with tyrosine and is thus functionally conserved within the family (Fig. 8). These data strongly suggest that the ligand specificity is likely to be extremely similar, if not identical, across the CBM60 landscape. Affinity, however, is variable in CBM60, reflecting avidity effects in some proteins (CjCBM60A), but not in others (vCBM60). Inspection of the CjCBM60A sequence reveals a 10-residue C-terminal extension that is absent in vCBM60, which may contribute to protein dimerization. Indeed this sequence is present in 10 of 17 CBM60 members (Fig. 8), suggesting that increased affinity through avidity effects may be a common feature in this CBM family. It should also be noted that one enzyme (Q21G61), derived from the marine bacterium Saccharophagus degradans, contains a tandem copy of CBM60 that has the C-terminal extension (57). It is possible that this protein will assemble four copies of a CBM60 into a protein dimer and thus bind particularly tightly to multivalent ligands, such as xylan and galactan. Multiple CBMs are a common feature in extracellular glycoside hydrolases expressed by S. degradans, and it has been proposed that this might reflect the marine environment. Thus, in such a dilute ecosystem, if secreted enzymes are not tethered to the plant cell wall via CBMs with high affinity, they will rapidly disperse, and their benefit to the host organism will be lost (57).

DISCUSSION
This study reports a new CBM family, CBM60, and the structural basis for the specificity displayed by this family. CBM60 proteins appear FIGURE 7. Circular permutation. A schematic of how CBM36 and vCBM60, which display low sequence identity, are evolutionarily related through a circular permutation event. In the sequence alignments of the two proteins conserved residues are indicated by an asterisk. Calcium protein ligands are shown in red (aspartates) and green (amino acids that interact with the metal via their backbone carbonyls). The aromatic residues that form the second sugar binding subsite are in cyan. FIGURE 8. Alignment of CBM60 sequences. The sequence alignment was derived from a search of the UNIPROT dataset using vCBM60 as the query sequence and the BLASTw search engine. All proteins have an E value of Ͻe Ϫ14 . Residues that are invariant within the family are indicated by an asterisk. Amino acids, whose side chains and backbone carbonyl interact with the ligand-binding calcium, are highlighted in red and cyan, respectively. The aromatic residue that dominates the second sugar binding subsite is highlighted in green, while the residues that contribute to the disulfide bond are colored magenta. SAD1 and SAD2 are the duplicated tandem CBM60 modules from the S. degradans protein Q21G61.
to display unusually broad ligand specificities, recognizing a range of ␤-linked polysaccharides that adopt very different conformations. In general CBMs display tight specificity for their target ligand (16,17,19), and thus it is particularly unusual for a module to bind xylan, ␤-1,4-galactan, and ␤-1,4-containing glucans. This broad specificity reflects the targeting of the equatorial O2 and O3 of pyranose sugars as the primary specificity determinant, which is a feature of many carbohydrates.
The key geometric signature of such sugars is the O3-C3-C2-O2 torsion angle of ϩ60°. In comparison, the O3-C3-C2-O2 torsion angle of a mannoside is Ϫ60°, which explains why CBM60 is unable to bind homopolymers of mannan. Indeed, for the notable occasions where Ca 2ϩ has been observed to bind mannosides (it is a feature of GH47 and GH92 mannosidases, for example), the Ca 2ϩ coordination is actually used to help distort O3 and O2 away from the normal geometry of a mannoside ring, contracting the torsion to ϳ0° (58,59). Furthermore, in the case of CBM60 the axial O2 will not interact with the calcium in site 1 and will make a steric clash with the tryptophan in site 2. It is likely that the biologically significant ligand recognized by CBM60 modules is xylan, because they are located predominantly in xylanases (26,30). The capacity of this family to bind to galactan and ␤-glucan probably reflects the limited hydroxyls targeted by the proteins.
Although this lack of specificity may appear counterintuitive, xylans are complex carbohydrates in which the backbone ␤-1,4-polymer is decorated by an array of structures at the O2 and/or O3 positions (1). Thus, by targeting the polar groups of only a single xylose, coupled to the approximate location of the adjacent moiety, the protein can bind to myriad decorated xylans, because side chains are not appended to every backbone sugar in these polymers (1). Indeed, the observation that xylan esterases are often located in CBM60-containing xylanases suggests that these multimodular enzymes target acetylated xylans, reinforcing the importance of CBM60 binding to only a very limited region of the polymer. It has previously been shown that other xylan-specific CBMs have evolved to accommodate decorated xylans by targeting the pyranose rings, exposing the hydroxyls to solvent (15,16). Nevertheless, because the ligand binding sites of these CBMs often accommodate six tandem xylose units, a significant proportion of the hydroxyls is not accessible to the solvent, and thus these modules could not accommodate ligands with a high degree of decoration (15,16).
The discovery of the CBM60 family completes the modular assignment of the enzymes that comprise the complex xylan degrading system of C. japonicus. Thus, a subset of three enzymes (xylanase CjXyn10B, esterase CjCE1, and the xylanspecific arabinofuranosidase, CjAbf62A) contains a cellulosespecific CBM2a and a CBM35, which targets uronic acids that decorate xylans from rapidly growing cells and pectins, which are attacked by pectate lyases (21). The bacterium also synthesizes a GH10 (CjXyn10A) and a GH5 (CjXyn5A) xylanase that contain only cellulose-specific CBMs (CBM2a and CBM10 in the GH10; only CBM2a in the GH5), whereas the two GH11 xylanases both contain a CBM60 (CjXyn11A also contains a cellulose-specific CBM10 and a CE4 xylan-specific esterase) (25). Thus, cellulose appears to function as a primary receptor for the xylan-degrading apparatus of C. japonicus, first pro-posed by Kellett and colleagues (29). Furthermore, CBM2as, and likely other type A CBMs, are able to slide over cellulose surfaces (60). An attractive hypothesis is that the xylan-degrading enzymes are initially locked onto the plant cell wall through their cellulose binding CBMs and are then able to locate regions rich in xylans. It is possible that CjXyn5A and CjXyn10A target xylans that are in intimate contact with cellulose microfibrils, and that CjXyn10B, and its associated accessory enzymes, attack regions of xylan that are in close proximity with pectins (or are present in rapidly growing cells), while the two CBM60 modules lock the two GH11 xylanases, CjXyn11A and CjXyn11B, onto xylans, which, because of their extensive acetyl decorations, are not in intimate contact with cellulose or other polysaccharides. The removal of the acetate groups by the xylan esterase in CjXyn11A would enable the two CBM60-containing enzymes to subsequently hydrolyze the exposed xylan backbone.
To conclude, this report describes a new family of xylanbinding CBMs that, by targeting, primarily, a single backbone xylose residue, likely directs the cognate enzymes to highly decorated regions of the polysaccharide. The ligand-binding apparatus of CBM60 and CBM36 display remarkable similarity; however, the binding residues are located in very different regions of the two protein families. It is evident that families CBM60 and CBM36 arose through a circular permutation event and, remarkably, the reorganization of the jelly roll fold did not disrupt the integrity of the ligand binding site. It should be noted that the strand order in CBM36 is conserved in the other ␤-sandwich CBM families. It is likely, therefore, that CBM60 arose through the circular permutation of CBM36 rather than vice versa. The flexibility in xylan recognition displayed by CBM60 underscores the utility of this CBM family in directing enzymes to different substructures of this major hemicellulosic polysaccharide, illustrating its utility in the toolbox of biocatalysts required to deconstruct the plant cell wall.