A Novel, Noncatalytic Carbohydrate-binding Module Displays Specificity for Galactose-containing Polysaccharides through Calcium-mediated Oligomerization*

The enzymic degradation of plant cell walls plays a central role in the carbon cycle and is of increasing environmental and industrial significance. The catalytic modules of enzymes that catalyze this process are generally appended to noncatalytic carbohydrate-binding modules (CBMs). CBMs potentiate the rate of catalysis by bringing their cognate enzymes into intimate contact with the target substrate. A powerful plant cell wall-degrading system is the Clostridium thermocellum multienzyme complex, termed the “cellulosome.” Here, we identify a novel CBM (CtCBM62) within the large C. thermocellum cellulosomal protein Cthe_2193 (defined as CtXyl5A), which establishes a new CBM family. Phylogenetic analysis of CBM62 members indicates that a circular permutation occurred within the family. CtCBM62 binds to d-galactose and l-arabinopyranose in either anomeric configuration. The crystal structures of CtCBM62, in complex with oligosaccharides containing α- and β-galactose residues, show that the ligand-binding site in the β-sandwich protein is located in the loops that connect the two β-sheets. Specificity is conferred through numerous interactions with the axial O4 of the target sugars, a feature that distinguishes galactose and arabinose from the other major sugars located in plant cell walls. CtCBM62 displays tighter affinity for multivalent ligands compared with molecules containing single galactose residues, which is associated with precipitation of these complex carbohydrates. These avidity effects, which confer the targeting of polysaccharides, are mediated by calcium-dependent oligomerization of the CBM.

Carbohydrate protein recognition plays a central role in biology, exemplified by microbe-mediated plant cell wall degradation. The release of sugars from plant cell walls is not only critical for the maintenance of the carbon cycle but is of increasing industrial and environmental significance through the development of second generation lignocellulose-based biofuels (1). The chemical and physical complexity of plant cell walls restricts their accessibility to enzyme attack and thus the recycling of photosynthetically fixed carbon is a relatively slow biological process.
Representative three-dimensional structures of around half the CBM families are available. The majority of these proteins display a ␤-sandwich fold in which the ligand-binding site for extended glycan chains is generally located on the ␤-sheet that includes the concave surface of the protein (for review see Ref. 2). For CBMs that recognize terminal mono-or disaccharides, the ligand-binding site is often located in the loops that connect the two ␤-sheets. Although there are examples of CBMs where the orientation of aromatic residues confers ligand specificity (12,13), it is increasingly evident that polar interactions play a more important role in carbohydrate recognition (8,9,14). Recent reports have also revealed examples of CBMs that harness calcium in ligand recognition (8,15,16). CBMs are generally located in monomeric enzymes; thus cooperative (or avidity) effects occur rarely. Exceptions to this general rule are evident, however, in enzymes that contain multiple copies of CBMs that display the same specificity, where avidity effects between these modules has led to increased affinity for polysaccharides such as xylan (17)(18)(19). Given that oligomerization is a general feature of lectins (carbohydrate binding domains that are not components of enzymes), where avidity effects are common (20), it is surprising that similar macromolecular associations have not been observed in the CBM literature.
The multienzyme complex expressed by Clostridium thermocellum, referred to as the "cellulosome," is one of the most powerful and intricate plant cell wall-degrading systems described to date (21,22). The enzyme complex is anchored onto the plant cell wall by the noncatalytic scaffoldin protein, which contains a family 3 CBM that binds to crystalline cellulose (23). The catalytic subunits of the cellulosome also contain CBMs, which target ␤-glucans, xylans, uronic acids, and chitin (7, 16, 24 -27), which direct the complex toward the target substrates for these enzymes. The extent to which the cellulosome can be directed toward the myriad of carbohydrate structures in plant cell wall, however, remains relatively unexplored, and indeed, numerous cellulosomal proteins contain modules of unknown function that could, potentially, comprise CBMs that display novel specificities. Furthermore, the quaternary structure of the cellulosome may provide a background for cooperativity between the CBMs within this protein complex, although such interactions have not been observed experimentally.
Here, we report the structure and biochemistry of a module, defined as CtCBM62, of the C. thermocellum cellulosomal protein Cthe_2193 (GenBank TM protein accession ABN53395.1; defined hereafter as CtXyl5A) that exhibits no significant sequence similarity to CBMs in the CAZy database. CtCBM62 displays a ␤-sandwich fold that binds to terminal ␣and ␤-Dgalactopyranose or L-arabinopyranose residues of complex polysaccharides. Ligand specificity is conferred primarily through extensive interactions with the axial O4 of galactose and arabinose, a distinctive feature of these two sugars. CtCBM62 also displays calcium-mediated oligomerization resulting in avidity effects that confer selectivity for polysaccharides rather than monovalent oligosaccharides.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification of Components of Cthe_2193-DNA encoding the following regions of CtXyl5A, GH5-CBM6-CBM13-Fn3-CtCBM62 Docl (mature CtXyl5A; see Fig. 1) and CtCBM62, were amplified using primers, containing NheI and XhoI restriction sites, listed in supplemental Table  S1. The amplified DNAs were cloned into NheI/XhoI-restricted pET21a, such that the encoded recombinant proteins contain a C-terminal His 6 tag. To express the two C. thermocellum proteins, Escherichia coli strain BL21(DE3), harboring appropriate recombinant plasmids, was cultured to mid-exponential phase in Luria broth at 37°C followed by the addition of isopropyl ␤-D-galactopyranoside at 1 mM, to induce recombinant gene expression, and incubated for a further 5 h at 37°C. The recombinant proteins were purified to Ͼ90% electrophoretic purity by immobilized metal ion affinity chromatography using Talon TM , a cobalt-based matrix, and eluted with 100 mM imidazole, as described previously (19). When preparing the selenomethionine derivative of CtCBM62 for crystallography, the protein was expressed in E. coli B834 (DE3), a methionine auxotroph, cultured in media comprising 1 liter of SelenoMet Medium Base TM , 50 ml of SelenoMet Nutrient Mix TM , and 4 ml of selenomethionine solution (10 mg/ml). Recombinant gene expression was as described above, as was protein purification, except that all buffers were supplemented with 10 mM ␤-mercaptoethanol.
Mutagenesis-Site-directed mutagenesis was carried out using the PCR-based QuikChange method (Stratagene) deploying the primers listed in supplemental Table S1.
Binding Assays-Affinity gel electrophoresis was used to screen for the binding of CtCBM62 to polysaccharides, following the method of Ref. 28. The proteins were subjected to nondenaturing PAGE, in the presence of 5 mM CaCl 2 , deploying parallel gels containing no ligand and the target polysaccharide at 100 g/ml, respectively. The gels were also loaded with BSA, which acts as a nonbinding negative control. After electrophoresis, the gels were stained with Coomassie Blue, and proteins that bound to the polysaccharide displayed reduced electrophoretic migration in the presence of the complex carbohydrate. The binding of CtCBM62 to its ligands was quantified by isothermal titration calorimetry (ITC), as described previously (25). Titrations were carried out in 50 mM Na-HEPES buffer, pH 7.5, containing 5 mM CaCl 2 (or 5 mM EDTA) at 25°C. The reaction cell contained protein at 145 M, while the syringe contained either the oligosaccharide at 10 mM or polysaccharide at 3-5 mg/ml. The titrations were analyzed using Microcal Origin version 7.0 software to derive n, K a , and ⌬H values, although ⌬S was calculated using the standard thermodynamic equation, RTlnK a ϭ ⌬G ϭ ⌬H Ϫ T⌬S.
Analytical Ultracentrifugation-Sedimentation velocity experiments were performed using an Optima XLA analytical ultracentrifuge (Beckman Coulter) with an An60 Ti four-hole rotor. Purified CtCBM62 (see above) was dialyzed into either CaCl 2 buffer (50 mM Na-HEPES, pH 7.5, 150 mM NaCl, 10 mM CaCl 2 ) or EDTA buffer (50 mM Na-HEPES, pH 7.5, 150 mM NaCl, 10 mM EDTA) using a Microdialyzer System 100 (Pierce). In each experiment, 400 l of 1 mg/ml CtCBM62 and 410 l of reference buffer were loaded into 12-mm double-sector Epon centerpieces equipped with quartz windows and equilibrated for ϳ1 h at 20°C. Experiments were performed at a rotor speed of 50,000 rpm and a wavelength of 280 nm. Data were collected over 8.5-h periods using a radial step size of 0.003 cm. The partial specific volume (v ) of 0.7211 was calculated from the amino acid sequence. The buffer viscosity of 0.00892 centipoise and density of 1.0042 g/ml for the CaCl 2 buffer or 1.005 g/ml for the EDTA buffer were calculated using the program SEDN-TERP. All data analysis for sedimentation velocity experiments were performed using the program SEDFIT, version 12.1b. Continuous sedimentation coefficient distribution c(s) analyses were restrained by Marquardt-Levenberg regularization at a confidence interval p ϭ 0.68 with uniform prior knowledge (29). The base line, meniscus, frictional coefficient, systematic time invariant, and radial invariant noise were fitted. The final fit c (P␦) (s) implemented prior knowledge of discrete species for determining the size distribution (30). The r.m.s.d. values for all reported experiments were 0.008 or less.
Crystallography-CtCBM62 was crystallized using the hanging drop vapor diffusion technique at 20°C with an equal volume (1 l) of protein and reservoir solution. The native and selenomethionine (in the presence of 5 mM DTT) forms of apo-CtCBM62 were crystallized in 20% (w/v) PEG 3350, 0.2 M trisodium citrate, pH 8.3 (PEG/Ion TM 1 screen condition 46; Hampton Research) at 150 mg/ml. CtCBM62 in complex with 10 mM xyloglucan oligosaccharides was crystallized in the same condition described above at a protein concentration of 191 mg/ml. CtCBM62, in complex with 10 mM 1 6-␣-GalMan 3 , was crystallized in 0.5 M ammonium sulfate in 0.1 M Na-HEPES, pH 7.4, containing 30% 2-methyl-2,4-pentanediol (condition 59 of the PACT screen) also with protein at 191 mg/ml. The CtCBM62 crystals grew over 4 -5 days, after which 25 l of mother liquor, including 30% (v/v) glycerol, was added as the cryoprotectant before being flash frozen in liquid nitrogen.
All the crystals of CtCBM62 were in space group F432 with unit cell dimensions of a ϭ b ϭ c ϭ 192.2 Å, with one protein molecule in the asymmetric unit. The structure of apo-CtCBM62 was solved by SIRAS, exploiting both the anomalous scattering from the seleniums and the isomorphous differences between the selenomethionine derivative and the native diffraction data. After processing the diffraction data in XDS (31) and SCALA (32), the unmerged intensities were input to the HKL2MAP (33) interface to SHELX (34), and the heavy atom substructure was solved in SHELXD. The SHELXE solvent flattened map was of sufficient quality to build manually, in COOT (35), the CtCBM62 molecule. The ligand-bound structures of CtCBM62 were determined by molecular replacement with AMORE (36) using the apo structure as the search model. All structures were refined to convergence using REFMAC5 (37) with manual corrections being applied in COOT (35). The data collection, phasing, and refinement statistics are displayed in Table 1, and the PDB codes for the protein structures are as follows: 2YFU, 2YFZ, and 2YG0.

RESULTS
CtCBM62 Is a Component of the C. thermocellum Cellulosomal Protein CtXyl5A-An initial analysis of the C. thermocellum genome was performed as described in Cantarel et al. (6). The analysis identified 72 genes encoding modular proteins with elements corresponding to CAZy families of catalytic modules and CBMs, which, based on the presence of type I dockerins, were components of the cellulosome. The functions of many of these proteins have been demonstrated biochemically or inferred from sequence similarity to enzymes with known activities. There remain, however, several cellulosomal subunits that contain modules with no sequence similarity with proteins of known function. The open reading frame identified as locus Cthe_2193 (defined hereafter as CtXyl5A) is one such protein. The protein is likely to be secreted because it contains  an N-terminal 20-residue signal peptide. The mature polypeptide includes 928 amino acids and consists of six modules, which are displayed in Fig. 1. In addition to the type I dockerin, the protein contains two CBMs from families 6 and 13, a fibronectin domain, and a GH5 module, which is likely to be the catalytic component of the protein (see the accompanying paper, which describes the biochemical properties of the fulllength enzyme (38)). A particularly noteworthy feature of the open reading frame is the sequence extending from 740 to 878, defined hereafter as CtCBM62, which displays no significant sequence similarity to proteins in the CAZy database. To investigate the function of CtCBM62, a recombinant form of this protein module was expressed in soluble form in E. coli and purified to electrophoretic homogeneity.
Identification of a Novel CBM in CtXyl5A-Biochemical analysis of CtCBM62 failed to reveal catalytic activity against an extensive range of plant structural polysaccharides, including cellulose and various hemicellulose and pectin polymers (data not shown). To explore whether CtCBM62 fulfills a noncatalytic carbohydrate binding function, the capacity of the protein to bind to a range of polysaccharides was assessed by affinity gel electrophoresis. The data, reported in Table 2, show that the protein module binds to xyloglucan, arabinogalactan, and galactomannan, but does not recognize the other polymers evaluated, including the arabinoxylans that are hydrolyzed by CtXyl5A (38). The protein module therefore includes a functional CBM and represents the founding member of a new family designated CBM62. To investigate the specificity of CtCBM62 in more detail, the capacity of the protein to bind to galactomannan-and xyloglucan-derived oligosaccharides and monosaccharides was assessed by ITC. The data, reported in Table 3, with example titrations displayed in Fig. 2, show that CtCBM62 does not bind to mannooligosaaccharides or cellulooligosaccharides, the backbone structures of galactomannan and xyloglucan, respectively, or the repeating XXXG (X is a backbone glucose decorated ␣1,6 with Xyl and G represents unsubstituted glucose) motif of xyloglucan (39). The protein, however, does bind to 1 6-␣-D-GalMan 3 , 3 6 4 6-␣-D-Gal 2 Man 5 , and a mixture of XLXG, XLLG, and XXLG, where L is X in which the xylose is decorated with a ␤1,2-linked D-galactose residue. The stoichiometry of binding was unity for galactose residues (in the oligosaccharides) indicating that CtCBM62 has a single ligand-binding site. The thermodynamic data show that the binding of CtCBM62 to all its ligands is enthalpically driven, whereas the change in entropy makes a negative contribution to overall affinity, as observed for the majority of CBMs studied to date (14,16,27). These data indicate that CtCBM62 binds to galactomannan, arabinogalactan, and xyloglucan, which are all decorated with galactose residues, through recognition of the hexose sugar. This view was confirmed by the observation that the protein binds to galactose and arabinose, in which O4 is axial, but not to sugars such as xylose, mannose, and glucose, where O4 is equatorial. Indeed, the similar affinity of CtCBM62 for 1 6-␣-D-GalMan 3 and galactose, and the lack of significant   binding to XXXG, suggests that the backbone components of mannan and xyloglucan do not make significant interactions with the protein. It should be noted, however, that the affinity of the protein for Gal and 1 6-␣-GalMan 3 is 2 orders of magnitude lower than for galactomannan, whereas the K a values for the xyloglucan oligosaccharides are also significantly lower than for the corresponding polysaccharide ( Table 3). As mature CtXyl5A (which includes all the GH5-CBM6-CBM13-Fn3-CBM62-Doc1 domains) also exhibits much higher affinity for xyloglucan and carob galactomannan than galactose (Table 3), CtCBM62 displays its polysaccharide targeting function in the full-length enzyme. To summarize, the capacity of CtCBM62 to bind to xyloglucan and galactomannan is conferred by its recognition of either ␣or ␤-D-galactose.
CtCBM62 Displays Avidity Effects-The tighter binding of CtCBM62 to polysaccharides, compared with monovalent oligosaccharides and monosaccharides, is likely mediated through avidity effects, which occurs when a protein with multiple binding sites interacts with a multivalent ligand, such as a polysaccharide. Indeed, avidity effects are invariably associated with cross-linking and thus precipitation of the ligand-protein complex (18), which occurs when CtCBM62 binds to xyloglucan, galactomannan, and arabinogalactan (data not shown).
To explore the potential role of calcium (which plays a key role in cellulosome function (40)) in the avidity effects displayed by CtCBM62, the capacity of the protein to bind to xyloglucan and galactose was assessed in the presence of EDTA and calcium, respectively. The data, displayed in Table 4, show that for the simple monovalent ligand galactose, neither the addition of calcium nor the chelating agent EDTA influenced affinity for these carbohydrates. By contrast, CtCBM62 binds to the galactose-containing polysaccharide xyloglucan Ͼ100-fold more tightly in the presence of calcium than with EDTA. The data described above indicate that calcium enhances affinity for multivalent ligands but does not influence the recognition of carbohydrates containing a single galactose residue. The most likely explanation for the role of calcium is through its capacity to mediate oligomerization of CtCBM62, which would result in enhanced affinity for multivalent ligands through avidity effects. To explore this possibility, CtCBM62 was subjected to  a The ITC data were fitted to a single site binding model for all ligands. For polysaccharide ligands in which the molar concentration of binding sites is unknown, the n value was iteratively fitted to as close as possible to one, by adjusting the molar concentration of the ligand. b CtCBM62 was dialyzed in the presence of 2 mM CaCl 2 , and the metal was added to the ITC experiment. c CtCBM62 was dialyzed in buffer lacking CaCl 2 , and ITC was carried out in the absence of calcium. d CtCBMX was dialyzed into buffer lacking CaCl 2 , and the ITC was performed in the presence of 10 mM EDTA. size exclusion chromatography, using a Sephadex S100 column. In the presence of EDTA, CtCBM62 was eluted in a volume of 89 ml. Replacing the chelating agent with calcium reduced the elution volume to 74 ml (data not shown). Based on the calibration of the column with proteins of known molecular weights, CtCBM62 migrated as a monomer in the presence of EDTA and a dimer when calcium was included in the elution buffer. Calcium-mediated oligomerization of CtCBM62 was confirmed using analytical ultracentrifugation. In the presence of EDTA, CtCBM62 sediments as a monomer with a sedimentation coefficient of 2.0 S. Calcium induces CtCBM62 to form a dimer at 4.2 S that is in dynamic equilibrium with the monomer, causing a shift to 2.8 S (Fig. 3). These data indicate that the capacity of calcium to enhance the affinity of CtCBM62 for multivalent ligands is mediated by metal ion-dependent oligomerization of the protein, which leads to avidity effects.
Crystal Structure of CtCBM62-To explore the mechanism by which CtCBM62 binds to both ␣or ␤-D-galactose, the crystal structure of the protein module was determined. In the conditions used to obtained crystals of the protein, no calcium was added, and thus the CBM crystallized in its monomeric form. The final structure corresponds to residues 739 -878 of fulllength CtXyl5A. CtCBM62 displays a canonical ␤-sandwich fold (2) comprising two ␤-sheets containing five antiparallel ␤-strands on the concave face (␤-strands 1, 2, 4, 5, and 7) and three antiparallel ␤-strands on the convex face (␤-strands 3, 6, and 8) (Fig. 4). The protein contains a bound metal located at the beginning of the loop connecting ␣-1 helix to the ␤-2 strand which, based on its B-factor in comparison to neighboring atoms, its octahedral coordination, the bond distances, its satisfactory refinement with no residual positive or negative difference density, and its interaction with only oxygen ligands, has been modeled as calcium. This structural calcium is conserved in many CBM families that display a ␤-sandwich fold (2 Inspection of the surface of the protein reveals a shallow pocket formed by the loops connecting ␤-1 and ␣-1, ␣-1 and ␤-2, ␤-2 and ␣-2, although ␣-2 also contributes to the pocket (Fig. 5), a topology consistent with the specificity of the protein for the terminal sugars of complex polysaccharides. The crystal structure of CtCBM62 in complex with 1 6-␣-D-GalMan 3 or  XLXG confirms that the pocket includes the ligand-binding site and provides insight into how the protein recognizes the galactosyl residue, which makes identical interactions with the protein in both ligand complexes. Thus, the ␣-face of the pyranose ring of the bound galactose makes extensive hydrophobic contacts with Trp-754, which is aligned parallel to the sugar ring. Such hydrophobic interactions are a generic feature of CBM ligand recognition (12-14, 19, 44). The bound galactose also makes several direct hydrogen bonds with the protein; O1 and O2 make polar contacts with the OH of Tyr-806, whereas O3 forms hydrogen bonds with N1 and N2 of Arg-803 and O␦2 of Asp-774. The O4 of galactose makes numerous potential polar interactions with CtCBM62; the hydroxyl is within hydrogen bonding distance of O␦1 and O␦2 of Asp-774 and N1 of Arg-803 and Arg-809. Finally, the endocyclic oxygen of the galactose makes a polar contact with N2 of Arg-809. The extensive interactions made by CtCBM62 with the axial O4 galactose explains why the protein displays tight specificity for this sugar; an equatorial O4, evident in glucose, mannose, and xylose, would not make polar contacts with the protein and, indeed, is predicted to make steric clashes with the side chain of Asp-774. Affinity gel electrophoresis (Fig. 6) and ITC (data not shown) showed that the replacement by alanine of all these residues resulted in the complete loss in ligand binding, confirming the role of these amino acids in galactose recognition. By contrast, mutation of residues on the concave surface of the protein, W782A, D786A, E789A, Q816A, E821A, F823A, and R856A (Fig. 6) did not appear to cause a reduction in affinity for xyloglucan. Similar observations were made with galactomannan and arabinogalactan (data not shown). These data confirm that the sugar-binding site in CtCBM62 is located in the loops connecting the two ␤-sheets and not on the concave surface of the protein.
Significantly, CtCBM62 does not make polar contact with the O6 of galactose suggesting that the protein may interact with L-arabinopyranose in which O4 is also axial. Data presented in Table 3 confirm that CtCBM62 does bind to arabinopyranose, albeit with lower affinity than galactose. This may reflect the strain imposed by forcing the sugar ring into its pyranose conformation.
In both ligand complexes, components of the oligosaccharides, in addition to the terminal galactose, are evident. Indeed the successive Gal-␤-Xyl and Xyl-␣-Glc linkages in XLXG curve the xyloglucan oligosaccharide into a shape that is perfectly accommodated by the surface of CtCBM62. These topological features may contribute to the slightly higher affinity of the protein for XLXG, compared with galactose; the increase in affinity may reflect polar interactions between the Glc backbone and residues (Gly-805 and Gly-862; Fig. 5) on the surface of the protein. To accommodate the Glc or Man backbone of XLXG and 1 6-␣-D-GalMan 3 , respectively, the side chain of Gln-809 is twisted by 114 o which, although imposing an energetic cost, enables both O⑀1 and N⑀2 of this residue to make polar contacts with the backbone of at least xyloglucan. By contrast, the Man backbone of 1 6-␣-D-GalMan 3 did not make direct polar interactions with CtCBM62.
As discussed above, CtCBM62 appears to undergo calciummediated oligomerization leading to increased affinity for poly-saccharides through avidity effects. Inspection of the structure of CtCBM62 reveals a calcium-binding site that is conserved in several other CBM families that display a jelly roll fold (CBM4, CBM6, CBM22, and CBM29 (2)). The calcium ion makes coordinate bonds with the O of Lys-763, O␦1 of Asp-766, O of Asp-766, the backbone O, and O␥ of Thr-771, O of Ala-868, and O⑀1 of Glu-869, Fig. 4, and as such it fulfills the octahedral geometry typical of many calcium-binding sites. The calcium-binding loop contributes to a crystal packing interface with a symmetryequivalent molecule in an adjacent asymmetric unit. The physicochemical characteristics of this particular interface are not greatly dissimilar from other potential dyad interactions in the crystal. The D766A, T771A, E869A, D766A/E869A and D766A/T771A/E869A mutants of CtCBM62 (in which the calcium-binding site has been disrupted by mutating, in various combinations, the three residues whose side chains make polar contacts with the metal) displayed calcium-dependent high affinity for xyloglucan, supplemental Table S2. Thus, the calcium-binding loop does not appear to contribute to the oligomerization of the protein in solution. Attempts to map the dimer-binding site by replacing all the surface Asp and Glu residues with alanine (mutations listed in supplemental Table S2) failed to influence the avidity effects displayed by the protein. Thus, currently, the site of CtCBM62 oligomerization remains unclear.
CtCBM62 Defines a New CBM Family-CtCBM62 was subjected to BLAST analysis to identify other proteins that are related to the CBM. The analysis revealed that CtCBM62 displayed very distant similarity to a CBM32 module, comprising residues 2086 -2229 in the Caldicellulosiruptor kronotskyensis protein Calkro_0121 (BLAST score of 7e Ϫ4 ), with significant sequence identity restricted to the N-terminal 17 residues of the two protein modules. No sequence similarity was identified between CtCBM62 and any other CAZy protein, including the remaining 1000 CBM32 modules. Thus. CtCBM62 represents the founding member of a new CBM family, defined as CBM62. The BLAST search, however, identified five non-CAZy protein modules that display Ͼ42% sequence identity with CtCBM62 with e values Ͻ10 Ϫ23 (supplemental Fig. S2). These proteins are therefore likely to be members of family CBM62. Two of these CBM62 modules are components of large extracytoplasmic proteins of unknown function, although the other three contain members of glycoside hydrolase families. The residues that comprise the ligand-binding site in CtCBM62 are highly conserved in the five proteins suggesting that they may also bind to galactose (and arabinopyranose). Indeed, one of these proteins, C7IBP5, is a member of family GH98, where endo-␤-galactosidase is the only reported activity. In addition to the five close homologs, there are 40 proteins that also show sequence similarity with CtCBM62 with e values Ͻ10 Ϫ5 . The limited sequence similarity is generally restricted to the C-terminal portion of CtCBM62. Closer analysis, however, showed that noncontiguous regions of CtCBM62 display extensive sequence similarity with these proteins. When a rearranged linear sequence of one of these proteins, a conserved hypothetical protein from Parbacteroides distasonis (GenBank TM accession number ABR45148), in which the N-and C-terminal regions are permuted at position 69 (the N-terminal Phe-466 and C-terminal Asn-600 residues are ligated together and cleavage of the protein occurs between Asn-535 and Asp-536), is compared with CtCBM62, then sequence similarity with the apparent "noncontiguous homologs" extends across the whole sequence of the CBM. Thus, the proteins display noncontiguous stretches of homology with a score of 3e Ϫ7 . If CtCBM62 is compared with the rearranged Parbacteroides protein, then sequence similarity is contiguous with an amino acid identity of ϳ40% and a score of e Ϫ17 (Fig. 7 and supplemental Fig. S2). The sequence similarity of noncontiguous regions of these proteins is strongly indicative of a "circular permutation." In this process, genetic events result in the ligation of the terminal residues of the protein, which is subsequently cleaved to yield a new N and C terminus (45,46). The cleavage or ligation event likely occurred in CBM62 members at a position equivalent to the peptide bond linking Gly-797 and Tyr-798 in CtCBM62.
The vast majority of CBM families display a ␤-sandwich fold in which the strand order and a structural calcium are highly conserved features (2). It is therefore likely that these families arose from a common progenitor sequence. As the ␤-strand order in CtCBM62 is similar to other CBM families (e.g. CBM4, -6, - 15, -22, -29, -35, and -36), it is unlikely that the protein module arose through a circular permutation event. It would therefore appear that a cleavage and ligation event occurred in an ancestral member of CBM62, which resulted in the large number of proteins that display noncontiguous sequence similarity to CtCBM62. It should be noted that alignment of these noncontiguous sequences with (rearranged) CtCBM62 shows that the key ligand-binding residues are highly conserved, sug-gesting that these proteins may also target terminal galactose residues. In general, proteins that have been subjected to circular permutation adopt a ␤-barrel or ␤-sandwich fold in which the N and C termini are in close proximity. Circular permutation events have been observed in a range of enzymes, including those that modify carbohydrates (45,46). As CBMs generally display a ␤-sandwich fold in which the two termini are in close proximity (2), these proteins are candidates for permutation events. Indeed, a recent report has shown that CBM36 and CBM60 display highly conserved ligand-binding sites derived from different regions of the respective proteins, which likely reflect a circular permutation event (22). The evolutionary rationale (if any) for circular permutation events is not entirely clear. Intuitively, such events, which retain overall three-dimensional structure, may introduce subtle structural changes that alter the function of the protein. In view of the circular permutation event within CBM62, we suggest the division of the family into two subfamilies. The two subfamilies display contiguous sequence similarity to CtCBM62 or are related to the protein by a circular permutation event, respectively.

DISCUSSION
This study reveals a novel CBM family that recognizes galactose-containing polysaccharides, which is located in an enzyme that displays arabinoxylanase activity (the catalytic module is described in the accompanying paper (38)), and is a component of the C. thermocellum cellulosome. The ␤-sandwich fold displayed by CtCBM62 is typical of the majority of CBM families (2). Although historically the ligand-binding site of CBMs is associated with the concave surface of these proteins, this appears to apply only to proteins that recognize the internal regions of polysaccharides. The location of the CtCBM62 ligand-binding site in the loops that connect the ␤-sheets is an example of an increasingly common feature of CBMs that recognize terminal sugars (8,16,22,47).
An intriguing feature of CtXyl5A is that the enzyme contains a CBM that binds terminal D-galactose or L-arabinopyranose residues of polysaccharides, a specificity mediated by numerous interactions with the axial O4, which distinguishes these two sugars from other neutral carbohydrates, notably mannose, xylose, and glucose, which are widely distributed in plant structural polysaccharides (48). In general, plant cell wall hydrolases contain CBMs that target the substrate recognized by the enzyme and/or crystalline cellulose (2). Although most arabinoxylans contain arabinofuranose and 4-O-methyl-D-glucuronic acid side chains, it is apparent that these polysaccharides are very heterogeneous molecules, and eucalyptus contains arabinoxylans that are also decorated with D-galactose (49). It is therefore possible that the primary substrate for CtXyl5A is an arabinoxylan that also contains sugar side chains (D-galactose) recognized by CtCBM62. Such a targeting strategy has resonance with the observation that a CBM32 targets LacNAc The protein backbone is shown for reference. In the XLXG complex, the electron density is contoured at 3, 0.14 e Ϫ /Å 3 , and for GM3, the map is also at 3, e Ϫ /Å 3 . B and D display the same view as A and C, respectively, of the carbohydrate-binding site of CtCBM62, with hydrogen bonds drawn as dashed lines and key amino acids labeled. In both B and D only the amino acids that make hydrogen bonds with the ligand are shown. (Gal-␤1,4-GlcNAc), but the appended enzymes functions as an N-acetyl-␤-hexosaminidase (42). Recent studies, however, have shown that CBMs may direct their appended enzymes to their target substrates by binding to a polysaccharide that is in close proximity with the substrate. Examples include xylanases and pectinases appended to cellulose-specific CBMs (50). It is therefore possible that arabinoxylans are in close association with xyloglucans, galactomannans, and/or arabinogalactans in certain cell walls, and by targeting the ␤and ␣-D-galactose or L-arabinopyranose residues, CtCBM62 brings the catalytic module of CtXyl5A into close proximity with its substrate.
The observations that CtCBM62 binds more tightly to complex polysaccharides than to monovalent ligands and binding is associated with the formation of an insoluble polysaccharide lattice are classic features of avidity effects (18,20). The rationale for the increase in K a relates to the decrease in the frequency with which the protein will completely dissociate from its complex ligand, compared with monovalent carbohydrates. Examples of such avidity effects have been observed previously in enzymes containing multiple CBMs that recognize the same complex ligand (17,18). By contrast, avidity effects through the oligomerization of a discrete enzyme-located CBM, as observed here for CtCBM62, have not been reported previously, although such events are a very common feature of plant and animal lectins (20).
Analysis of the structure of derivatives of CtXyl5A showed that only CtCBM62 was capable of dimerization (data not shown). The biological significance of calcium-induced dimerization of CtCBM62, within the context of full-length CtXyl5A, is particularly intriguing when one considers that the module is derived from a protein that is a component of the cellulosome. The cellulosome assembles through the interaction of the dockerin module, present on the enzymes, with one of the nine cohesins located on the noncatalytic scaffoldin protein, CipA (for review see Refs. 21,40). It is possible that calciuminduced dimerization of CtCBM62 recruits two molecules of CtXyl5A onto the same cellulosome molecule. The oligomerization, however, may also lead to cellulosome cross-linking (if the dockerins in the dimer bind to different molecules of CipA), and thus contribute to the formation of polycellulosomes, which are formed on the surface of the bacterium (11).
In conclusion, this report reveals a novel CBM that represents the founding member of family 62. CtCBM62 binds to the terminal D-galactose and L-arabinopyranose residues in polysaccharides, specificities not previously observed in CBMs that are components of the C. thermocellum cellulosome. The CBM uniquely mediates protein dimerization, which confers polysaccharide, rather than oligosaccharide, targeting through avidity effects. The catalytic module appended to CtCBM62 is an arabinoxylanase (described in the accompanying paper (38)), suggesting that the enzyme targets xylans that contain a highly complex repertoire of side chains.