The Interaction of a Carbohydrate-binding Module from a Clostridium perfringens N-Acetyl-β-hexosaminidase with Its Carbohydrate Receptor*

Clostridium perfringens is a notable colonizer of the human gastrointestinal tract. This bacterium is quite remarkable for a human pathogen by the number of glycoside hydrolases found in its genome. The modularity of these enzymes is striking as is the frequent occurrence of modules having amino acid sequence identity with family 32 carbohydrate-binding modules (CBMs), often referred to as F5/8 domains. Here we report the properties of family 32 CBMs from a C. perfringens N-acetyl-β-hexosaminidase. Macroarray, UV difference, and isothermal titration calorimetry binding studies indicate a preference for the disaccharide LacNAc (β-d-galactosyl-1,4-β-d-N-acetylglucosamine). The molecular details of the interaction of this CBM with galactose, LacNAc, and the type II blood group H-trisaccharide are revealed by x-ray crystallographic studies at resolutions of 1.49, 2.4, and 2.3 Å, respectively.

secreted glycoside hydrolases have predicted substrate specificities consistent with metabolism of dietary polysaccharides in the human gut making gastric mucins, highly hydrated glycoproteins comprising up to 80% carbohydrate, the most likely target of the secreted C. perfringens enzymes. Indeed, the majority of these enzymes are predicted to have specificities appropriate for the degradation of complex glycans, suggesting that this bacterium is well equipped to attack the diverse sugar structures of the mucins in this environment. Consistent with this is the mucosal necrosis associated with severe enteritis caused by C. perfringens (6), which may be in part due to the arsenal of C. perfringens glycoside hydrolases. In turn, breaking down the mucosal barrier could improve access of other toxins, such as the pore-forming cpe (C. perfringens enterotoxin), to the epithelial layer.
Thirteen of the predicted C. perfringens (strain 13) glycoside hydrolases (and notably 13 glycoside hydrolases for each of the sequenced Bacteroides sp. genomes (thetaiotaomicron, fragilis YCH46, and fragilis 25285)) are highly modular and have, in addition to catalytic domains, modules with amino acid sequence identity to family 32 carbohydratebinding modules (CBMs) 3 (7). CBMs are generally considered to be modules with carbohydrate-binding function, but no catalytic activity, that are found within the modular architectures of glycoside hydrolases (8). They are currently classified into 45 families based on amino acid sequence identity (see afmb.cnrs-mrs.fr/ϳcazy/CAZY/index.html.) and loosely grouped into three types, A (crystalline polysaccharide binding), B (polysaccharide chain binding), and C (small sugar binding or "lectin-like"), established by functional properties (8). Based on limited biochemical evidence, the family 32 CBMs appear to be type C CBMs. The x-ray crystal structures of the Cladobotryum dendroides galactose oxidase and the Micromonospora viridifaciens sialidase (MvGH33) revealed the ␤-sandwich lectin-like folds of their cognate CBM32 modules (9,10). Co-crystallizations of MvGH33 with galactose showed the potential of its CBM32 module, here called MvCBM32, to bind galactose (10,11), which was subsequently verified by functional studies (7). Thus, the family 32 CBMs, which are often referred to as F5/8 domains, have been generally considered as galactose binding domains.
The family 32 CBMs stand out among the CBM families, because they are frequently found appended to enzymes with "exotic" specificities (e.g. sialidases, ␤-hexosaminidases, mannosidases, and fucosidases) and are found in bacteria capable of causing disease in humans. In contrast, the vast majority of CBMs in other families are found appended to enzymes that are active on plant cell wall polysaccharides. In the context of the plant cell wall hydrolases, the function of CBMs has been repeatedly shown to be to localize the enzyme to an appropriate substrate (8). By analogy to the plant cell wall hydrolases, the role of family 32 CBMs is likely to target their parent enzymes to carbohydrate substrates; however, with these CBMs the substrates are likely more complex glycans, such as gastric mucins in the case of C. perfringens and Bacteroides sp. Previous studies of family 32 CBMs have not addressed the possibility that the specificity of these CBMs extend beyond a preference simply for galactose and may actually include specificity for complex glycan chains. Thus, studies of family 32 CBMs from bacterial pathogens enter a new area of carbohydrate-binding modulemediated host-pathogen interactions and will extend our knowledge of this potentially complex family of carbohydratebinding proteins.
To better understand CBM32 structure and function we initiated studies of CpGH84C from C. perfringens (strain ATCC 13124). This enzyme, which comprises four modules defined on the basis of primary structure comparisons (see Fig. 1), was chosen as a model system, because, relative to other family 32 CBM-containing enzymes, it is reasonably small and has a simple modular architecture that is amenable to accurate definition of the modular boundaries. To facilitate structure-function studies, we dissected this protein at the genetic level to recombinantly produce isolated CpCBM32. The experimental results reveal the ability of the CBM to bind to terminal glycotopes commonly found in elaborated O-and complex N-glycans (12)(13)(14), whereas the x-ray crystal structures of CpCBM32 in complex with sugar help uncover the molecular details that confer this binding ability. Structural comparisons with known CBM32s and other C. perfringens CBM32s suggest variations in glycan specificities, but all are based on a key terminal galactose residue. This work provides the first detailed structure-function analysis of a family 32 CBM and will provide a foundation for further studies of CBMs within this family.

EXPERIMENTAL PROCEDURES
Materials-Unless otherwise stated, chemicals, carbohydrates, glycoproteins, and polysaccharides were purchased from Sigma.
Protein Production and Purification-pCBM was transformed into the BL21star (DE3) Escherichia coli expression strain (Invitrogen). A 1.5-liter culture was grown in Luria-Bertani (LB) media, supplemented with ampicillin (100 g/ml), to an optical density of ϳ1 and induced with 1 mM isopropyl 1-thio-␤-D-galactopyranoside then grown overnight at 37°C. The cells were harvested at 4,000 ϫ g and resuspended in 20 ml of binding buffer containing 20 mM Tris, pH 8, and 0.5 M NaCl. Cells were lysed using a French pressure cell. Cell debris was removed by centrifugation for 1 h at 27,000 ϫ g. The supernatant was applied to His-Select resin followed by step elution with binding buffer containing imidazole concentrations between 5 and 500 mM. Samples were run on a 15% SDS polyacrylamide gel, and fractions containing the polypeptide of interest were pooled. Proteins were concentrated, and buffer was exchanged in a stirred ultrafiltration unit (Amicon, Beverly, MA) using a 5,000 molecular weight cut-off membrane (Filtron, Northborough, MA). Purity, assessed by SDS-PAGE, was Ͼ95%.
Determination of Protein Concentration-The concentrations of purified proteins were determined by UV absorbance (280 nm) using calculated molar extinction coefficients (16).
Macroarray Binding Experiments-CpCBM32 was labeled with Alexa Fluor 680 carboxylic acid, succinimidyl ester (Molecular Probes, Eugene, OR). Alexa Fluor 680 was resuspended in Me 2 SO to a concentration of 10 g/l. Negative and positive controls (bovine serum albumin and Clostridium cellulovorans CcCBM17, respectively) and CpCBM32 were buffer-exchanged into 0.2 M NaHCO 3 , pH 8.3, by de-salting using a PD-10 column (General Electric). 50 g of Alexa Fluor 680 was incubated with 1 mg of protein, and the reaction was allowed to proceed for 1 h at room temperature in the dark. Buffer exchange was done into phosphate-buffered saline, pH 7.4, by gel-filtration using a PD-10 column to remove any unreacted dye molecules.
1 l of 0.1 or 1% solutions of various plant polysaccharides, glycosaminoglycans, and glycoproteins were spotted onto a nitrocellulose membrane. Membranes were allowed to dry completely then blocked for 2 h with 10 ml of 1% bovine serum albumin, 0.05% Tween 20 in phosphate-buffered saline, pH 7.4. Labeled protein (0.4 mg) was allowed to incubate overnight at 13°C in 10 ml of 1% Tween 20 in phos-phate-buffered saline, pH 7.4. Blots were washed once with 10 ml of 1% Tween 20 in phosphate-buffered saline, pH 7.4, for 40 min. Emission of fluorescence was detected at 700 nm using the Odyssey Infrared Imaging System from LI-COR Biosciences. Detection was scored based on levels of signal emission (supplemental Fig. S1).
UV Difference-Automated UV difference titrations of CpCBM32 were performed as described previously (17). Difference spectra were examined for peak and trough wavelengths, and values at the appropriate wavelengths were extracted for further analysis. The wavelengths for the maximum peak-to-trough differences were determined individually for each sugar solution. The peak-to-trough heights at three wavelength pairs were calculated by subtraction of the trough values from the peak values, and the dilution-corrected data were plotted against total carbohydrate concentration. Data for the three wavelength pairs were analyzed simultaneously with MicroCal Origin (version 7.0) using a one-site binding model accounting for ligand depletion. Experiments were performed at 20°C in 50 mM Tris, pH 7.5. The data reported are the averages and standard deviations of three independent titrations.
Isothermal Titration Calorimetry-Isothermal titration calorimetry (ITC) was performed as described previously (18,19) using a VP-ITC (MicroCal, Northampton, MA). Protein samples were extensively dialyzed against buffer (50 mM Tris, pH 7.5). Sugar solutions were prepared by mass in buffer saved from the final protein dialysis step. Both protein and sugar solutions were filtered and degassed immediately prior to use. Protein concentrations were determined by UV absorbance as described above. Aliquots of 10 mM LacNAc were titrated into CpCBM32 (350 M), which gave C values Ͼ5 (20). Aliquots of 5.0 mM type II H-trisaccharide were titrated into CpCBM32 (245 M). In this case, the C value was Ͻ1 (ϳ0.3) due to the low binding affinity. Based on the 1:1 binding observed in the crystal structure, the stoichiometry was fixed at 1 in the analysis of this data. Data were fit with a single binding site model. Crystallization and Data Collection-All crystallization experiments were performed using the hanging-drop vapordiffusion method. Prior to crystallization, the H6 tag was removed from CpCBM32 by treatment with enterokinase over a 4-day period. The digested sample was run through a Novagen His-bind Quick 900 cartridge to remove the His tag and any undigested protein from the solution. Samples were concentrated and exchanged as above into 20 mM Tris, pH 8. Cocrystals of CpCBM32 (10.5 mg/ml) with galactose (ϳ10 mM) were obtained with 0.2 M MgCl 2 , 25% polyethylene glycol 2000 monomethylether, and 0.1 M Tris, pH 7.5. These crystals were cryoprotected with 15% glycerol in mother liquor. 1.5 M sodium/ potassium phosphate was used to co-crystallize CpCBM32 (20 mg/ml) LacNAc and the type II blood group H-trisaccharide (␣-Lfucosyl-1,2-␤-D-galactosyl-1,4-␤-D-N-acetylglucosamine). Optimization of this condition determined the ideal NaH 2 PO 4 / K 2 HPO 4 concentration to be 1.5 M (at a 1:100 ratio) and the optimal protein concentration to be 5 mg/ml. The cryoprotectant used was 1.45 M Na/ K 2 HPO 4 with 27% ethylene glycol.
Diffraction data were collected with a Rigaku R-AXIS IVϩϩ area detector coupled to an MM-002 x-ray generator with Osmic "blue" optics and an Oxford Cryostream 700. Data were processed with Crystal Clear/d*trek (21). All data collection statistics are given in Table 1.
Structure Determination-Cp-CBM32 was solved by molecular replacement using the family 32 galactose-binding module from the M. viridifaciens sialidase (pdb code 1EUT) (10) as a search model. The program molrep (22) was able to find one clear rotation/translation solution corresponding to the single molecule in the asymmetric unit. This initial model was corrected, and ligand was added by successive rounds of building using COOT (23). Refinement was done using REFMAC (24). Water molecules were added using the REFMAC implementation of ARP/wARP and inspected visually prior to deposition. The final model lacking waters was used as a starting model to solve the structures of the other CpCBM32 sugar complexes. Initial models were corrected, one or more ligands were added, and waters were added as above. Residue numbering conforms to the numbering in the complete CpGH84C enzyme. Final model statistics are given in Table 1.

Carbohydrate Binding Properties of CpCBM32 from
CpGH84C-Based on its amino acid sequence identity (ϳ30%) with the family 32 galactose-binding module from the M. viridifaciens sialidase we postulated that the CpCBM32 module in CpGH84C is indeed a carbohydrate-binding protein. This was initially investigated by macroarray binding experiments using arrayed glycoproteins and polysaccharides, which revealed significant binding to asialofetuin, type III porcine gastric mucin, and fetuin with relative binding of asialofetuin Ͼ porcine gastric mucin Ͼ fetuin (supplemental Fig. S1). Using knowledge of the glycan structures commonly found on these glycoproteins (14) as a guide, we assessed the binding of various monosaccharides, disaccharides, and trisaccharides to CpCBM32 by qualitative and quantitative UV difference experiments. The addition of D-galactose and GalNAc (N-acetyl-D-galactosamine) to CpCBM32 resulted in large perturbations of the UV difference spectra indicative of the involvement of tryptophan in sugar binding (17) (Fig. 2A). L-Fucose, D-glucose, D-mannose, and GlcNAc (N-acetyl-D-glucosamine) were also tested but did not influence the UV absorption of CpCBM32 and thus are unlikely primary ligands of CpCBM32. Quantitative studies by UV difference titrations showed an affinity of roughly 1 ϫ 10 3 M Ϫ1 for galactose-based monosaccharides and a preference  for ␤rather than ␣-configured O-methylgalactose ( Fig. 2B and Table 2). The presence of the acetamido group of GalNAc did not appear to confer any advantage to binding. CpCBM32 preferred lactose and LacNAc over galactose by factors of ϳ2.5and 10-fold, respectively. The increased affinity of LacNAc (cf. lactose) suggested the specific involvement of the 2Ј-acetamido group of the GlcNAc moiety in binding. Along similar lines, the ␣-1,2linked L-fucose of fucosyllactose (␣-L-fucosyl-1,2-␤-D-galactosyl-1,4-␤-D-glucose) substantially reduced the binding affinity to levels below those we could quantify due to limiting quantities of sugar but did not entirely legislate against binding. The binding to LacNAc and the type II H-trisaccharide was further investigated by ITC (Fig. 2, C and D, and Table 3). The values for LacNAc revealed the enthalpically driven binding process common to protein-carbohydrate interactions (25)(26)(27). The ⌬H of binding was temperature-dependent, the analysis of which allowed the approximation of the change in heat capacity (⌬C p ) to be Ϫ105 (Ϯ5) cal/mol/K. Again, this small negative ⌬C p is consistent with the majority of protein-carbohydrate interactions. The affinity of CpCBM32 for the type II H-trisaccharide was too low to accurately deconvolute the stoichiometry of binding, so, on the basis of the LacNAc binding data and x-ray crystallography data (see below), this value was fixed at 1 for the analysis (28). Like with fucosyllactose, the fucose residue of the type II H-trisaccharide is detrimental to binding relative to LacNAc or lactose, although it does not destroy binding. The roughly 1.3 kcal/ mol increase in free energy due to the fucose moiety on the H-trisaccharide appears to come by virtue of a substantial enthalpic penalty (ϳϩ9.0 kcal/mol), which is partially offset by a favorable contribution to entropy (ϳϩ25 cal/mol/K) (note: assumption of up to a 25% error in the estimate of stoichiometry due to errors in either the sugar or protein concentration does change the magnitudes of the calculated thermodynamic penalties but does not change their sign and, thus, qualitative interpretation of the data are unaffected).
The Fold of CpCBM32-CpCBM32 was crystallized in the presence of galactose, and its x-ray crystal structure was solved by molecular replacement at high resolution (1.49 Å). The fold is that of a ␤-sandwich comprising a ␤-sheet of three antiparallel ␤-strands opposing a ␤-sheet of five anti-parallel ␤-strands (Fig. 3A). The closest structural neighbors are the family 32 CBMs from the fungal Cladobotryum dendroides galactose oxidase (PDB code 1GOF (9)) (here the CBM is referred to as CdCBM32) and the bacterial M. viridifaciens sialidase (PDB code 1EUT (10)) (here the CBM is referred to as MvCBM32; see below for a more detailed comparison). More distantly related are the family 6 and 36 CBMs as well as the Anguilla sp. fucolectin.
CpCBM32 heptahedrally coordinates one metal ion through the side chains of Thr-655, Asp-650, and Glu-672 and the backbone carbonyls of Phe-647, Asp-652, Thr-655, and Ala-761 (Fig. 3C). This ion was judged most likely to be calcium on the basis of three observations. Firstly, the atom possessed significant anomalous scattering properties, because it could be easily located in anomalous difference maps, indicating that the ion was not sodium or magnesium. Secondly, when considering ions commonly found associated with carbohydrate-binding proteins, heptahedral coordination mediated entirely by oxygen atoms is most consistent with potassium, calcium, or manganese. Lastly, when modeled as calcium, the B factor of this    from the carbohydrate binding site. Studies on other CBMs where similar "structurally" relevant ions have been removed have suggested only a stabilizing role for such ions, because apo-CBMs are still competent to bind carbohydrate (29,30). It is notable that the structural position of this ion is also conserved in CBMs from family 6 (7, 31, 32), 36 (33), and the Anguilla sp. fucolectins (34).
CpCBM32 in Complex with Carbohydrates-The electron density map of the galactose complex clearly revealed a single bound molecule of galactose. Trp-661 and Phe-757 create a relatively hydrophobic pocket, which cradles the C6-hydroxymethyl group. Trp-661 "stacks" against the flat, apolar surface created by carbons 3-6 on the B-face of D-galactose with the O4 pointing away from the aromatic residue (Fig. 4). Specificity for the non-reducing end of D-galactose (and presumably GalNAc) is conferred by three potential hydrogen bonds to the axial O4 of the sugar from the terminal ␦O of Asn-695, a terminal guanido nitrogen of Arg-690, and the ⑀N from the imidazole ring of His-658 (Fig. 5). Additional hydrogen bonds are made between the O3 of D-galactose and Arg-690 and Glu-641; the endocyclic oxygen of galactose makes a hydrogen bond with the amide nitrogen of Asn-695. The burial of the galactose O3 would prevent binding to the AB blood group antigens.
Examination of the CBM32-galactose complex revealed that the O1, O2, and O6 groups of the bound galactose were solventexposed, thus hinting at how this protein might accommodate the additional sugar groups of LacNAc and the type II H-trisaccharide. We probed the presence of additional subsites by cocrystallizing CpCBM32 with these possible biological ligands of CpCBM32 (Fig. 4). The binding of LacNAc appears to induce some changes relative to the galactose structure. In the galactose structure the loop comprising residues 750 -753 was very disordered, and, in fact, residues 751 and 752 could not be modeled. In the LacNAc structure, this loop becomes ordered, and the side chains of Asp-749 and Gln-750 close in on the sugar to make additional hydrogen bonds with both the galactose and GlcNAc residues (Figs. 4 and 5). One terminal ␦O of Asp-749 is situated to hydrogen bond with the O3 of GlcNAc. The other terminal ␦O of Asp-749 along with the ␦O of Ser-693 positions an ordered water for a water-mediated hydrogen bond to the oxygen of the ␤1,4-linked glycosidic bond. The terminal amide nitrogen of Gln-750 is positioned to hydrogen bond with the O6 of galactose and the O3 of GlcNAc. The 2Ј-acetamido group of the GlcNAc does not appear to be positioned to make any additional hydrogen bonds. However, this chemical group does sit above a relatively apolar platform made by the planar conformations of the Asp-749 and Gln-750 side chains and makes a number of van der Waals contacts. A water-mediated hydrogen bond occurs between the carbonyl group oxygen of GlcNAc, and the main-chain nitrogen of Ala-751. These additional interactions are the likely source of the increased affinity of CpCBM32 for LacNAc versus lactose.
The crystal structure of CpCBM32 in complex with the type II blood group H-trisaccharide revealed the interactions between the LacNAc core of this sugar and the protein to be identical to the LacNAc-CpCBM32 interactions (Fig. 4). The fucose residue of the type II H-trisaccharide occupied a subsite located above Arg-690, and hydrogen bonding interactions with the protein were limited to two water-mediated hydrogen bonds: the first provided by the ␦O of Asp-749 and the ␦O of Ser-693 (Fig. 5). The same water is also involved in a water-mediated hydrogen bond to the glycosidic bond between galactose and GlcNAc. The second water is positioned for potential hydrogen bonding with the O2 of fucose by the ⑀O of Glu-641.

Molecular Determinants of CpCBM32 Specificity-The
CBM32 module from CpGH84C appears to have three, possibly four, subsites that accommodate the monosaccharide units of oligosaccharide ligands. The primary subsite binds galactose but is also able to accommodate GalNAc, and it appears to provide the bulk of the binding free energy and thus provide the "anchor" for the interaction. ␤-Linked substituents on the O1 of this galactose result in improved binding affinity. In the case of LacNAc, where the GlcNAc is ␤-1,4-linked to the galactose, the affinity improves by an order of magnitude relative to galactose and is influenced positively by the presence of the acetamido group of Glc-NAc. Modeling of lacto-N-biose (␤-D-galactosyl-1,3-␤-D-Nacetylglucosamine; a component of the type I blood group H-trisaccharide antigen) into the CpCBM32 binding site, using LacNAc as a guide, suggested that CpCBM32 may also accommodate this sugar with the ␤-1,3-linked GlcNAc bound in the same secondary subsite as for the GlcNAc of LacNAc (not shown). In the case of lacto-N-biose, the C6-hydroxymethyl group of the GlcNAc residue in this disaccharide would be positioned similarly to the acetamido group of LacNAc. Although structurally there is apparently nothing to hinder the binding of lacto-N-biose, it is unknown if the C6-hydroxymethyl interaction of this sugar FIGURE 6. Comparison of CpCBM32 with other family 32 CBMs. Panel A shows an amino acid sequence alignment of the CpGH84C CBM32 (labeled GH84C) with CBM32 modules found in other C. perfringens glycoside hydrolases using a cut-off of 25% amino acid sequence identity. Labels refer to the enzyme family from which the CBMs originate (e.g. GH20 refers to the CBM32 from a C. perfringens family 20 glycoside hydrolase). A number appended to the label indicates that more than one CBM32 is found in that enzyme, and the number indicates the position of the CBM from the N terminus relative to the other CBMs. CdCBM32 refers to the CBM from the C. dendroides galactose oxidase, and MvCBM32 refers to the CBM from the M. viridifaciens sialidase. Above the alignment is the secondary structure of CpCBM32. Residues involved in galactose binding are indicated by a triangle; a closed circle indicates residues involved in binding both galactose and GlcNAc (of LacNAc); the open circle indicates the residue involved only in GlcNAc (of LacNAc) binding. Yellow bars above the sequence show regions potentially involved in the formation of a subsite for accommodated sugars linked to the O6 of galactose. Likewise, the green bar shows the O4 subsite, and the purple bar the O2 subsite. The secondary structure of MvCBM32 is shown beneath the alignment. B, structural overlap of CpCBM32 (green), MvCBM32 (gray; PDB code 1BZD (11)), and CdCBM32 (blue; 1GOF (9)). Bound metal ions are shown as spheres and bound ligands as stick models. C, structural overlap of CpCBM32 with MvCBM32. CpCBM32 is depicted in lime with its structural calcium in light orange. Galactose and amino acids involved in binding are in blue stick representation. MvCBM32 is depicted in white, and its structural sodium is colored pale yellow. Its amino acids and bound galactose are shown in pink stick representation. Arrows in B and C indicate the loop discussed in the text. would be energetically equivalent to those of the LacNAc acetamido group. Along an identical line of argument, the core 1 O-glycan (␤-D-galactosyl-1,3-␤-D-N-acetylgalactosamine) would be accommodated in a similar manner to lacto-N-biose.
The structure of CpCBM32 with the type II H-trisaccharide revealed an additional secondary subsite that accommodates the ␤-1,2-linked fucosyl residue. However, this appeared to incur a substantial energetic penalty. The thermodynamic values indicated an enthalpically unfavorable but partially offsetting entropically favorable contribution from the occupation of this subsite. This signature is consistent with the removal of ordered waters from the protein surface. Indeed, when comparing the binding site water networks in the D-galactose, LacNAc, and type II H-trisaccharide structures, the latter sugar results in the displacement of at least one, possibly two, waters that would otherwise be ordered on the protein surface in the absence of the fucosyl residue. Overall, although the fucosyl residue of the blood group H-antigen is accommodated by the CBM, it is not a preferred feature.
CpCBM32 may have a fourth subsite that accommodates substitutions linked to the O6 of the anchoring galactose. The O6 of galactose is solvent-exposed, which may permit the linkage of another sugar to that atom. Sialic acid is found linked 2-6 to LacNAc, typically at the terminus of a glycan, and to GalNAc, often as an elaboration of O-glycan cores (14). It is possible that CpCBM32 may accommodate a sialic acid residue within this putative fourth binding subsite.
Comparison with Other Family 32 CBMs-The closest structural neighbors of CpCBM32 are MvCBM32 and CdCBM32 with amino acid sequence identities of 35 and 30%, respectively, and r.m.s.d. values for both of 1.2 Å 2 over 135 and 130 matched C␣ atoms, respectively (Fig. 6, A and B). Clearly, the backbones of CpCBM32, MvCBM32 and CdCBM32 match very closely with the exception of a single loop in CpCBM32 (Fig. 6, A and  B). Recently, the high resolution crystal structure of MvCBM32 in complex with galactose was solved (11). Although a similar complex for CdCBM32 is not available, a comparison of the MvCBM32 and CdCBM32 structures (not shown) and the residues involved in galactose binding show excellent conservation of functional residues (Fig. 6A). However, in the absence of CdCBM32 complex, further comparison is uninformative. A comparison of MvCBM32 and CpCBM32 reveals that the architecture of the carbohydrate-binding sites and the constellation of protein-carbohydrate interactions are very similar. The majority of the galactose binding machinery comprising Glu-641, His-658, Trp-661, Arg-690, and Phe-757 of CpCBM32 is extremely well conserved between the two proteins (Fig. 6, A and C). Additionally, in binding galactose CpCBM32 employs Asn-695 whose structural analog in MvCBM32 is not positioned to interact with the galactose. The loop comprising residues 747-752 in CpCBM32 (indicated by the green bar in Fig. 6A and the arrows in Fig. 6, B and C) harbors the additional residues Asp-749 and Gln-750, which appear key in binding the GlcNAc residue of LacNAc. That MvCBM32 (and CdCBM32) lacks this loop suggests that Lac-NAc is not the primary ligand for these CBMs.
This galactose recognition machinery is well conserved among several other clostridial CBM32s despite having amino acid sequence identity as low as 25% (Fig. 6A). These enzymes span specificities for ␣or ␤-linked glucose, GlcNAc, galactose, GalNAc, and sialic acid sugars. The highest degree of sequence variability occurs in the regions of the polypeptides that contribute to the proposed additional binding subsites (Fig. 6A). This suggests that, among these reasonably closely related CBM32s, galactose is the primary ligand but that different substitutions on the galactose may be optimally accommodated by the different CBMs. This added diversity could complement the varied specificities of the enzymes to which these CBM32 modules are appended. However, precisely how CBM32 ligand preference might complement the enzyme specificity is unclear. For example, the receptor specificity of the CpCBM32 from CpGH84C for terminal LacNAc motifs does not match the specificity of CpGH84C, which evidence indicates is an exo-␤-D-N-acetylglucosaminidase (15,35,36).
Using amino acid sequence identity criteria less stringent than a 25% cut-off allows the identification of numerous additional family 32 CBMs in C. perfringens and other bacteria. However, at these lower levels of sequence identity the conservation of functional residues, including those involved in galactose binding, is lost, and only amino acids contributing to structure remain conserved (not shown). This suggests that the specificity of family 32 CBMs is not necessarily centered on galactose.

CONCLUSION
Roughly one-third of the bacterial enzymes containing family 32 CBMs belong to a varied range of glycoside hydrolases produced by the bacterial gut colonizers from species of Clostridium and Bacteroides. The role of these enzymes is likely in scavenging carbohydrates from dietary sugars or muco-oligosaccharides. By analogy to plant cell wall hydrolases, the role of the CBM32s in these enzymes is probably to attach the enzyme to a carbohydrate-bearing surface, which in the case of CpGH84C is likely the terminal LacNAc motif common to the O-linked glycans of mucin. This evokes an image of the enzyme being attached to a carbohydrate-bearing surface via the CBM while allowing the catalytic module to "graze" on that substrate. Given the diversity of enzymes that contain family 32 carbohydrate binding modules, one might expect, like with plant cell wall hydrolases and their CBMs, that the diversity of catalytic specificities is mirrored by the diversity of CBM32 specificities. Thus, CBM family 32 may be a resource to be mined for carbohydrate-binding proteins with a variety of specificity profiles.