Multidomain Carbohydrate-binding Proteins Involved in Bacteroides thetaiotaomicron Starch Metabolism*

Background: Bacteroides thetaiotaomicron is a prototype for understanding carbohydrate metabolism by colonic bacteria. Results: Two nonenzymatic membrane proteins involved in starch metabolism are composed of tandem carbohydrate-binding modules that each bind starch differently. Conclusion: B. thetaiotaomicron has evolved multiple starch-binding modules to compete for different forms of starch. Significance: Learning how gut bacteria degrade carbohydrates is crucial for understanding their role in nutrition. Human colonic bacteria are necessary for the digestion of many dietary polysaccharides. The intestinal symbiont Bacteroides thetaiotaomicron uses five outer membrane proteins to bind and degrade starch. Here, we report the x-ray crystallographic structures of SusE and SusF, two outer membrane proteins composed of tandem starch specific carbohydrate-binding modules (CBMs) with no enzymatic activity. Examination of the two CBMs in SusE and three CBMs in SusF reveals subtle differences in the way each binds starch and is reflected in their Kd values for both high molecular weight starch and small maltooligosaccharides. Thus, each site seems to have a unique starch preference that may enable these proteins to interact with different regions of starch or its breakdown products. Proteins similar to SusE and SusF are encoded in many other polysaccharide utilization loci that are possessed by human gut bacteria in the phylum Bacteroidetes. Thus, these proteins are likely to play an important role in carbohydrate metabolism in these abundant symbiotic species. Understanding structural changes that diversify and adapt related proteins in the human gut microbial community will be critical to understanding the detailed mechanistic roles that they perform in the complex digestive ecosystem.

phenotypic analyses of mutants lacking expression of the susE and susF genes reported that they were dispensable for growth on starch in vitro (13); although, they contribute substantially to starch binding by whole cells (11). Neither SusE nor SusF appears to possess enzymatic activity toward starch, as disruption of the only validated amylase (SusG) is not compensated for by the presence of these proteins. Additional support for the importance of SusE and SusF comes from the presence of similar lipoproteins in most other Sus-like systems with specificity for glycans other than starch (8,9). Although, only close relatives of these proteins involved in binding starch or similar glycans are currently grouped into the same protein families in the Pfam database: SusE (PF14292, currently 236 members) and PB002941 (currently 88 sequences) (15). Of note, the former family only corresponds to the first ϳ125 residues of SusE and does not include SusF; the latter family includes the C-terminal domains of both proteins. Very little sequence level homology exists between these proteins, but some are predicted to adopt carbohydrate binding module (CBM)-folds (16,17) and at least one of these proteins with specificity for ␤2,6-linked fructan binds polysaccharide in its pure form (9). Finally, a recent bioinformatics study comparing human gut metagenomic samples to those from non-gut environments found that one of the most abundant human gut-specific microbial protein families includes SusE and SusF (18).
To effectively degrade insoluble glycan structures, many microbial glycoside hydrolases are appended with noncatalytic CBMs. These small ␤-sheet rich domains, ϳ100 amino acids, often enhance glycan degradation by tethering the enzyme to the substrate, or by disrupting the secondary or tertiary structure of the glycan (19 -21). A great number of bacterial amylases contain one or more CBMs, and the removal or mutation of these domains decreases the ability of the enzyme to process insoluble starch (14,(22)(23)(24). In some instances, the addition of a starch CBM can impart the ability to degrade raw starch to an amylase that does not otherwise have this capability (25,26). To date, the carbohydrate active enzymes (CAZy) database recognizes 10 CBM families that bind starch, all of which describe protein domains that are components of amylases. Although nonenzymatic CBM-containing proteins have been described as part of cellulosomal complexes (1), nonenzymatic proteins composed of starch-binding CBMs have not been reported.
In this study we investigate the interactions of purified Bt SusE and SusF proteins with starch or its oligosaccharides using x-ray crystallographic and biochemical approaches. Structural analyses of SusE and SusF demonstrate that each protein functions as a multivalent starch-binding protein: SusE contains two binding sites and SusF contains three. The C-terminal regions of both proteins encompass two CBMs that are structurally very similar. The extra binding site in SusF is due to the insertion of an additional CBM into the middle of a sequence with otherwise similar topology to SusE. We constructed single and double binding site mutants in SusE and SusF to evaluate the individual contributions of each site to binding starch and various oligosaccharides. Each site displays subtle differences in its starch-binding architecture and binding preference, suggesting that each site is adapted to slightly different starch substrates. Including SusD and SusG, there are a total of eight distinct noncatalytic sites at which Sus proteins bind their substrate. Based on these observations, we speculate that SusE and SusF have evolved to help Bt compete for starch in the human intestinal tract, by sequestering starch at the bacterial surface and away from competitors. In addition, the occurrence of CBMs in nonenzymatic polypeptides, which is rarely reported, may serve to assist the catalytic function of SusG in this multiprotein system that is present on the cell surface.
SusE and SusF Lipid Attachment Site Mutation-The susE and susF genes plus ϳ700 bp of sequence flanking each gene were amplified from Bt strain ATCC 29148 using the primers listed in supplemental Table S3 and cloned into the suicide vector pExchange-tdk (12). Mutation of the SusE C21 and SusF C20 codons to alanine was carried out using the QuikChange site-directed mutagenesis kit (Stratagene). The mutated alleles were confirmed by sequencing and introduced into Bt by conjugation and counterselection on 5-fluoro-2Ј-deoxyuridine. Surface expression of SusE and SusF was probed by antibody staining of nonpermeabilized formaldehyde-fixed Bt cells grown on minimal media/maltose with rabbit polyclonal antibodies (Cocalico Biologicals) and detected with an Alexa Fluor 488-conjugated goat anti-rabbit IgG secondary antibody (Molecular Probes). SusE and SusF were detected in Bt whole cell lysates by Western blot using the rabbit polyclonal primary antibodies mentioned above together with an alkaline phosphatase-conjugated goat anti-rabbit IgG secondary antibody (Sigma).
Expression of SusE and SusF-To clone and express the SusE and SusF proteins, the gene fragments corresponding to the soluble domains of SusE (residues 35-387 for full-length and 172-387 for C-terminal domain) and SusF (residues 21-485) were amplified from Bt genomic DNA to include NdeI (SusE) or NheI (SusF) and XhoI sites at the 5Ј and 3Ј ends of the PCR products, respectively. The gene products were ligated into a modified version of pET-28a (EMD Biosciences) containing a recombinant tobacco etch virus (rTEV) protease recognition site. Site-directed mutagenesis of the cloned susE and susF genes was performed using the QuikChange multisite-directed mutagenesis kit with the susE-pET28rTEV or susF-pET28rTEV plasmid as the template. Starch-binding residues mutated to alanine in specific CBMs of SusE and SusF are listed in Table 1.
The pET28rTEV plasmids containing the allele of interest were transformed into Rosetta (DE3) pLysS cells (EMD Biosciences). Transformed cells were grown at 37°C for 20 h, and then the plates were scraped to inoculate culture media for protein expression. For native protein expression, the cells were grown in 1 liter of TB, plus kanamycin (50 g/ml) and chloramphenicol (20 g/ml) (in 2-liter baffled flasks) at 37°C until they reached an OD ϳ0.4, and the temperature was turned down to 22°C. Approximately 30 min after lowering the temperature, isopropyl 1-thio-␤-D-galactopyranoside was added to a final concentration of 0.5 mM, and the cells continued to grow overnight (16 -20 h). Cells were harvested by centrifugation at 6,000 ϫ g, and the cell pellets were stored at Ϫ80 ºC until protein purification. Selenomethioine (SeMet)-substituted protein was produced via the methionine inhibitory pathway (29), as previously described (30).
Purification of Native and SeMet-substituted SusE and SusF-All SusE and SusF proteins were purified using a 5-ml Hi-Trap metal affinity cartridge (GE Healthcare) according to the manufacturer's instructions. The cell lysate was applied to the column in His Buffer (25 mM NaH 2 PO 4 , 500 mM NaCl, 20 mM imidazole, pH 7.4). After sample loading, the column was washed with 40 ml of His buffer, then proteins were eluted with an imidazole (20 -300 mM) gradient. The His tag was removed by incubation with rTEV (1:100 molar ratio of rTEV to protein) at room temperature for 2 h, followed by overnight at 4°C while dialyzing against His buffer. The cleaved protein was then repurified on the 5-ml nickel column to remove undigested target protein, the cleaved His tag and His-tagged rTEV. Purified proteins were dialyzed against 20 mM HEPES, 100 mM NaCl (pH 7.0) prior to crystallization, and concentrated using Vivaspin 15 (10,000 MWCO) centrifugal concentrators (Vivaproducts, Inc.).
Crystallization and Data Collection-Crystallization conditions were screened via the hanging drop method of vapor diffusion in 96-well plates and using Hampton Screen kits (Hampton Research). Crystals were obtained for the native and SeMet-substituted full-length SusE protein at room temperature as hanging drop experiments using 16.5 mg/ml of protein and 2 mM ␣-cyclodextrin (␣CD) against a well solution of 16 -20% PEG 6000, 2 M NaCl, 100 mM malonate (pH 5.0). The SusE-␣CD crystals were then serially transferred into a cryoprotectant of 22% PEG 6000, 2.3 M NaCl, 50 mM malonate, 2 mM ␣CD, and 19% ethylene glycol and flash-frozen in liquid nitrogen prior to data collection.
Crystals of the SusE C-terminal domain (18 mg/ml) plus 0.5 mM maltoheptaose (M7) were grown at room temperature from hanging drops against a well solution of 2.5 M ammonium sulfate, 100 mM Bistris propane (pH 7.0). These crystals were flash-frozen in a cryoprotectant containing 2.0 M ammonium sulfate, 80 mM Bistris propane (pH 7.0), 1 mM maltoheptaose, and 20% ethylene glycol.
Crystals of the native and SeMet-substituted full-length SusF were grown via hanging drop at room temperature using 29.8 mg/ml of protein and 2 mM M7 against a well solution of 6 -12% glycerol, 1.5-2 M Na/KPO 4 (pH 6.3). The SusF-M7 crystals were then serially transferred into a cryoprotectant of 6 -12% glycerol, 1.75-2 M Na/KPO 4 (pH 6.3), 300 mM NaCl, 2 mM M7, and 16% ethylene glycol and flash-frozen in liquid nitrogen prior to data collection. SAD x-ray data sets for all SeMet-substituted crystals were collected at the Life Sciences Collaborative Access Team (LS-CAT) beamline ID-D at the Advanced Photon Source at Argonne National Labs, Argonne, IL. Native data sets for SusF as well as full-length SusE crystals were also collected at LS-CAT ID-D, whereas the SusE C-terminal x-ray data were collected at LS-CAT beamline ID-G. X-ray data were processed with HKL3000 and scaled with SCALEPACK (31). The structures of SusE and SusF were determined from the SAD data using the AutoSol subroutine within the Phenix software package (32,33). These initial models of SusE and SusF proteins were then utilized for molecular replacement in Phaser (34) against the native x-ray data sets. Data collection statistics are reported in supplemental Tables S1 and S2. The ramachandran plots for all three structures were generated using the MolProbity structure validation server (35). The structure of the SusE C-terminal domain with M7 had no outliers with 97.4% of residues in favored regions and the rest within the allowed regions of the Ramachandran plot. The SusE model with ␣-cyclodextrin also had no outliers, and displayed 94.3% of residues within the favored regions, and the rest in allowed regions. The SusF structure with maltoheptaose had two residues, Glu-89 (55.7, Ϫ23.4) and Ser-341 (Ϫ29.1, 143.3) that fell just outside the generously allowed region of the ramachandran plot. Glu-89 is part of a left-hand helical turn. A hydrogen bond between the peptidyl O of Leu-87 and the side chain imidazole N of His-91 distort the geometry of this turn. Ser-341 is at the beginning of an ␣-helix, and a hydrogen bond between the side chain hydroxyl of Ser-341 and the nearby side chain of Glu-379 may play a role in pulling this residue out of an ideal alignment. For the rest of the SusF model, 96.9% of residues are in the favored regions and the remaining residues in the allowed regions of the ramachandran plot.
Isothermal Titration Calorimetry-ITC measurements were carried out using a MicroCal VP-ITC titration calorimeter. Proteins were dialyzed into 50 mM HEPES pH 8.0 and oligosaccharides were prepared using the dialysis buffer. Protein (250 M) was placed in the sample cell and the reference cell was filled with dialysis buffer. After the temperature was equilibrated to 25°C, a first injection of 2 l was performed followed by 29 subsequent injections of 10 l of 20 mM ␣CD, M7, or glucosyl maltotriosyl maltotriose (GM3M3). The solution was stirred at 305 rpm and the resulting heat of reaction was measured. Data were analyzed using the Origin software package, fixing N to the known number of starch-binding sites in the protein of interest. The SusE-C only with GM3M3 isotherm was indicative of two binding events, one being very weak. This weak second binding event is unlikely to be relevant at biological concentrations of starch therefore we included only the first 15 injections in our curve fit to get an approximation of the affinity of the major binding event. Isotherms are displayed in supplemental Figs. S3-S11.
Adsorption Depletion Assay-The affinity of purified SusE and SusF for insoluble cornstarch was determined via adsorption depletion. Cornstarch (Sigma, S4126) was washed twice in an excess of ddH 2 O, then once with an excess of PBS. Starch was pelleted and suspended in PBS to make a 100 mg/ml of slurry. 20 mg of starch was pipetted into each well of a microtiter plate, pelleted, and the supernatant discarded. Starch pellets were suspended in 200 l of protein solution ranging from 1.5 to 0.1 mg/ml in PBS. Plates were incubated for 2 h at room temperature with agitation. Starch and bound protein was pelleted by centrifugation and the supernatant collected. Unbound protein concentration was determined with the Pierce Microplate BCA Protein Assay Kit. Bound protein per gram of starch was plotted as a function of free protein from three replicates and fit to a nonlinear regression using the one-site total binding equation (GraphPad Prism).

RESULTS AND DISCUSSION
SusE and SusF Are Surface-exposed Lipoproteins-Both SusE and SusF are predicted to contain an N-terminal signal sequence followed by Cys that should be lipidated after secretion and processing by signal peptidase II. Because a pathway for secreting lipoproteins to the external leaflet of the Gramnegative outer membrane has yet to be defined (36), we examined the cellular location of SusE and SusF by changing the predicted lipidated Cys of each protein to an Ala. This mutation should allow secretion and signal peptide cleavage by signal peptidase I, resulting in a soluble periplasmic form of each protein. Consistent with its predicted location, wild-type (WT) SusE and SusF were detected on the Bt cell surface when probed with SusE-or SusF-specific antibodies (Fig. 1, A and B). In contrast, SusE or SusF was not detected on the cell surface of mutant strains producing periplasmic SusE or SusF, although these proteins, in amounts similar to WT, were observed in cell lysates by Western blot. Consistent with earlier reports, growth of Bt lacking surface expression of SusE and SusF did not result in a significant growth rate defect on maize amylopectin and glycogen (data not shown).
SusE and SusF Have Multiple Starch-binding Domains-SusE and SusF were expressed in Escherichia coli from constructs that eliminated the N-terminal secretion and lipidation features. Structure determination of both proteins was performed using SAD phasing from crystals obtained from SeMet-substituted protein. The initial protein models were built from the SeMet data sets, and then used as models for molecular replacement with the native protein data sets (supplemental Tables S1 and S2).
The 2.0-Å crystal structure of SusF, the larger of the two proteins, included maltoheptaose (M7) (R work ϭ 19.6%, R free ϭ 24.8%) and encompassed residues 40 -485. The first 19 residues at the N terminus of the recombinant SusF were not resolved in electron density, suggesting a flexible linker to the lipidation site. The topology of SusF can be described as three tandem domains (N-terminal, middle, and C-terminal) that assume an S-shaped conformation in the crystal structure ( Fig.  2A). These domains are packed against each other, although the buried surface area between the N-terminal and middle domain (364 Å 2 ), and middle domain and C-terminal domain (345 Å 2 ) is quite small and includes just a few hydrogen-bonding contacts.
The N-terminal domain (residues 40 -160) of SusF consists of a ␤-barrel that is similar in overall-fold and topology to several immunoglobulin superfamily (IgSF) domains found in cell adhesion proteins including CD28 (1YCD-chainC; r.m.s. deviation 3.1 Å, 8% sequence identity), and CD47 (2JJS-chain A; r.m.s. deviation 2.7 Å, 12% sequence identity). Beyond this N-terminal domain, SusF consists of three ␤-sandwich CBMs of ϳ100 amino acids each. We will refer to these as CBMs Fa, Fb, and Fc, using "F" to denote that they are from SusF, and labeling them alphabetically from the N to C terminus. The middle domain of SusF (residues 161-274) is composed of CBM Fa, whereas the C-terminal domain is composed of two distinct CBMs (residues 275-383 as Fb and residues 384 -485 as Fc) that are closely packed together via hydrophobic interactions. Although each CBM displays unique binding-site features, the overall architecture of each is quite similar and reminiscent of many starch-binding CBMs (21). Submission of the three individual CBMs of SusF to the DALI server (37) revealed that all share the most structural homology with the X25 domain of the Bacillus acidopullyticus glycoside hydrolase (GH) family 13 pullulanase (PDB 2WAN), with Z-scores of 7.8, 7.3, and 4.9 for the Fa, Fb, and Fc CBMs, respectively. Although the core ␤-sandwich structure of the SusE and SusF CBMs are similar to described starch-binding CBMs, the ␤-strand topology is different, which prevented an amino acid sequence-based prediction of SusE and SusF as starch-binding CBMs. Therefore, we propose that the five CBMs between SusE and SusF should be added as a novel class of CBMs in the CAZy database (17).
The asymmetric unit of the SusF crystals (C2) contained one molecule of SusF and two molecules of M7, one at CBM Fb and one that adopts a nearly circular conformation and is shared between Fa and Fc of a symmetry related molecule. This packing arrangement does not suggest a dimeric interface, and both size exclusion chromatography and native PAGE suggest that SusF is a monomer (data not shown). The starch-binding sites of Fb and Fc are oriented nearly 180°away from each other, an arrangement that mimics the orientation of the tandem FIGURE 1. SusE and SusF are exposed on the surface of B. thetaiotaomicron. Alleles of susE and susF were created in which the N-terminal cysteine, which is lipidated to tether the proteins to the outer membrane, was mutated to alanine (SusE C21A and SusF C20A). These alleles were recombined into the native sus locus. Cells were grown to mid-exponential phase in minimal media/maltose to induce expression. A, Bt staining for SusE and SusF surface expression. Nonpermeabilized cells were fixed and probed for SusE and SusF surface expression using polyclonal antisera. Fluorescent images are shown with the corresponding bright field (BF) images. All images are shown on the same scale; bar ϭ 10 m. B, Western blot of lysates from whole cells expressing the wild-type and mutant alleles probed in A. Wild-type (1), SusE C21A (2), SusF C20A (3), and SusE C21A SusF C20A (4) Bt whole cell lysates were probed for SusE and SusF protein using polyclonal antibodies. Size difference between the wild-type and lipidation signal mutant proteins corresponds to loss of the lipid tail.
CBM41 domains of Streptococcus pneumoniae SpuA (38). However, in SusF the additional CBM Fa creates a triangle of binding sites, with each starch-binding site oriented ϳ120°a part ( Fig. 2A). The structure of SusE (residues 35-387) complexed with ␣CD was solved to a resolution of 2.5 Å (R work ϭ 20.4%, R free ϭ 24.2%). The final model includes residues 174 -387, as the predicted N-terminal domain (residues 38 -167) was not observed in the electron density (Fig. 2B). Sufficient space exists in the asymmetric unit for this domain, and both mass spectrometry analysis on SusE prior to crystallization, as well as SDS-PAGE analysis of extensively washed crystals indicated the prominent presence of the full-length (ϳ40 kDa) protein (data not shown). Therefore we conclude that there is a flexible linker between the N-and C-terminal domains, causing the former to be disordered in the crystal lattice. In the structure, two symmetryrelated molecules of SusE are clustered around a single molecule of ␣CD. There is very little (285 Å 2 ) buried surface area between the proteins and both size exclusion and native PAGE indicate that SusE is a monomer (data not shown).
The most striking difference between SusE and SusF is that SusE is ϳ10 kDa smaller, due to the absence of a middle domain corresponding to Fa in SusF. Although the N-terminal domain of SusE was not resolved in the crystal structure, the predicted structure of residues 38 -167 generated using I-TASSER (39,40) suggests a similar IgSF-type-fold (supplemental Fig. S1). The C-terminal domain of SusE is strikingly similar to the C-terminal domain of SusF and is also composed of two CBMs (residues 174 -283 as CBM Eb and residues 284 -387 as CBM Ec) packed tightly together. The C-terminal domains of SusE and SusF superimpose with an r.m.s. deviation of 1.3 Å over 189 C␣ atoms and share 38.6% sequence identity (Fig. 2C).
The SusF Starch-binding Sites Coordinate Oligosaccharides Differently-Each of the three CBMs in SusF display bound M7 in the crystal structure allowing a comparison of the molecular details of binding at each site. Each site has features universal to many starch-binding proteins: an arc of aromatic amino acids for hydrophobic stacking with glucose and hydrogen-bonding acceptors and donors for interacting with the O-2 and O-3 of glucose. However, each site also displays differences in ligand binding that may impart some specificity regarding which part of a starch molecule is preferred or how tightly it is bound.
A molecule of M7 is shared between the CBMs Fa and Fc of symmetry-related proteins, imposing a circular shape on the linear maltooligosaccharide (Fig. 3A). The ring-like appearance of M7 suggests that the ends of the ligand occur in different places in different molecules and thus an average of these orientations is manifest in the electron density. The Fa binding site displays a characteristic aromatic arc (Trp-177 and Trp-222) that stacks against Glc3 and Glc4; however, hydrogen bonding occurs at Glc3 and Glc2. It is more typical in starch-binding sites to observe the same glucose residue anchored in place by both hydrophobic stacking and hydrogen-bonding interactions (14,(41)(42)(43). At the Fb site, four of the seven glucose residues of M7 are resolved in the electron density (Fig. 3B). This site, unlike Fa, recognizes only two rather than three glucose moieties, although both monosaccharides at Fb are stabilized via hydrophobic stacking as well as hydrogen-bonding interactions. The Fc binding site is somewhat more extensive than the Fb site. The residues that create the aromatic platform for hydrophobic stacking, Trp-442 and Trp-396, are further apart than those within Fa and Fb, with Trp-441 wedged between these residues, and providing an additional hydrogen-bonding donor to the O-6 of Glc6 (Fig. 3C).
Although each of the SusF CBMs displays subtle molecular differences in the binding sites, the orientation of each is shown for the ligands. Note that the M7 observed at Fa and Fc is shared across a crystallographic symmetry axis, and therefore the electron density is the same. B, schematic representation of SusE (residues 174 -387), with CBM Eb (residues 174 -283) colored aqua and CBM Ec (residues 284 -385) colored pink. Bound ␣CD is displayed as red and gray sticks. Electron density for ␣CD from an omit map is displayed and contoured at 2 . The ligand observed at Eb and Ec is shared across a crystallographic symmetry axis, and therefore the electron density is the same. C, overlay of the SusE CBM Eb and Ec domains (blue) with the SusF CBM Fb and Fc domains (red). The r.m.s. deviation of the models is 1.3 Å for 189 C␣ atoms. The ligand ␣CD bound to SusE is shown as light blue sticks, and the maltotetraose and M7 bound to SusF are shown as pink sticks.
curved M7 at these surface sites suggests that a long helix of starch could be accommodated with the pitch of the helix lying parallel to the plane of the protein surface. This might allow the protein to recognize and bind the double helical starch structures present in more resistant and insoluble forms of starch (amylose) that transit to the distal intestinal environment.

CBM Ec Has an Additional Loop That May Mediate Interactions with Single Helical
Starch-Noting the absence of the N-terminal domain of SusE in our structure of the near fulllength protein, we decided to pursue a higher resolution structure of the SusE C-terminal domain (residues 172-387). A structure of this domain with M7 was solved to a resolution of 1.3 Å (R work ϭ 16.5%, R free ϭ 17.8%). The space group of this structure was P2 1 2 1 2 1 with two SusE molecules per asymmetric unit. These monomers overlay with an r.m.s. deviation of 0.3 Å for all atoms, except one loop (residues 360 -365) with a maximum C␣ deviation of 2.7 Å, likely due to crystal contacts. The C-terminal domain from the SusE structures with ␣CD and M7 overlay with a r.m.s. deviation of 0.4 Å with no C␣ deviations in either starch-binding site.
CBM Eb overlays with CBM Fb with an r.m.s. deviation of 1.4 Å over 93 C␣ atoms (33.3% sequence identity). The binding of ␣CD at Eb is similar to M7 binding at Fb, with adjacent glucose residues bound via both hydrophobic stacking and hydrogen bonding interactions (Fig. 4A). In the SusE structure with M7, no oligosaccharide is bound at Eb, rather a protein-protein crystal contact is made between SusE molecules of adjacent asymmetric units. These crystals were generated using a 2:1 molar ratio of protein to M7, so it is not surprising that one of the starch-binding sites was empty. This observation and additional data discussed below suggest that Eb has a weaker binding site relative to Ec.
The second CBM of SusE (Ec) has the most extensive set of protein-ligand interactions among all five CBM domains contained in SusE and SusF. In the ␣CD structure Ec contacts 5 of 6 possible glucose residues, but a different mode of binding was observed in the M7 structure, highlighting the potential for Ec to bind single helical regions of starch (Fig. 4). Tryptophans Trp-336 and Trp-296 of Ec create a hydrophobic arc with Trp-335 wedged between, but not participating in glycan binding. A unique feature of the Ec site is the loop created by residues 353-357 that cap one end of the binding site, with the side chain of Ile-355 centered in front of the ␣CD ring. This loop provides multiple hydrogen-bonding partners to Glc1, Glc2, Glc3, and Glc6 of ␣CD, via specific interactions with Asn-353, Leu-354, Ile-355, and Asp-356 (Fig. 4B). This starch-binding loop is unlikely to be flexible, and rather is anchored in place by a network of hydrogen bonds with an adjacent loop defined by residues 359 -362. The topology of this binding site, in particular the centering of the Ile-355 side chain at the ligand is strikingly similar to the binding of ␤CD to the glycogen-binding domain of AMP-activated protein kinase (44).
In the structure of SusE with M7, the ligand is shared across a symmetry axis at the CBM Ec, between chain A of one asymmetric unit and chain B of another (Fig. 4, C and D). An overlay of these two ligands at chains A and B simulates a model of a 10-glucose long maltooligosaccharide interacting with this extensive binding site (Fig. 4E). In both chains A and B, M7 is anchored to the protein by the same set of hydrophobic stacking interactions with Trp-336 and Trp-296, as well as hydrogen-bonding through Arg-326 and Arg-350. At chain A, the maltooligosaccharide helix, from the nonreducing to reducing end, projects toward the protein against the capping loop (Fig.  4, C and D). The peptidyl oxygen atoms of Leu-354 and Ile-355  OCTOBER 5, 2012 • VOLUME 287 • NUMBER 41 participate in hydrogen bonding with hydroxyl groups from adjacent glucose residues as seen in the structure with ␣CD, but due to the pitch of the oligosaccharide helix, Asp-356 is now 5.4 Å away. However, the same M7 bound by chain B is instead "draped" over this loop, with the maltooligosaccharide from the nonreducing to reducing end extending from the hydrophobic cradle of binding residues and extending up and over the capping loop. Thus, in chain B the nonreducing end of the ligand is nestled closer to the capping loop, such that the glucose at the terminal nonreducing end interacts with Asp-356. In this ligand orientation, Ile-355 intercalates directly into the groove of the M7 helix. As mentioned earlier, the overall atomic struc-tures of chains A and B are nearly identical, with the exception of a helical turn (residues 361-365) that is about 15 Å from the starch-binding site and therefore unlikely to influence binding. The orientation of the starch-binding loop is identical in the structures with M7 and ␣CD.

Bacteroides Multidomain Starch-binding Proteins
The presence of the starch-binding loop in Ec could govern the forms of starch that bind at this site. A long helix of starch could bind at Eb with the pitch of the helix parallel to the protein surface, similarly to how starch may bind to SusF. At these sites, it is the outer shape of the starch helix that is recognized, and thus single or double helical forms of starch could bind. However, the loop containing Ile-355 that intercalates into one Electron density for maltoheptaose was generated from an omit map, contoured at 3 . Note that due to crystallographic symmetry the ligand in panels C and D are the same molecule and thus electron density is only displayed in one panel. E, overlay of M7 bound by chains A (purple) and B (pink) at CBM-Ec, demonstrating the manner in which this site may accommodate a longer molecule of starch.
of the grooves of the starch helix at Ec makes interactions with double helical starch unlikely, suggesting this site could be specific for partially unwound single helical forms or small starch breakdown products.
SusE and SusF Display Differences in Their Affinity for Starch Oligosaccharides-The chemical and physical structures of starches and related molecules that reach the human colon vary due to a number of features: molecular weight, the pattern and density of ␣1,6-branches, the degree to which they have already been degraded by human enzymes, and even cooking methods. Bt requires the Sus to degrade a variety of different molecules, including amylose, amylopectin, and pullulan (45). Although the Sus outer membrane amylase (SusG) will only hydrolyze ␣1,4-linkages (14), at least one of the periplasmic amylases (SusB) is promiscuous toward a variety ␣-glucosidic linkages (46). Thus, it is possible that SusE and SusF interact with oligosaccharides that contain ␣1,6-branches prior to transport across the outer membrane. Moreover, the cyclic maltooligosaccharide ␣CD mimics the rigid, geometrically constrained curvature of larger amylose molecules, making it possible to probe starch-binding proteins for affinity toward starch secondary structures as opposed to linear oligosaccharides with more flexible helical geometry.
To test the affinity of the various SusE and SusF binding sites for different structures, we performed isothermal titration calorimetry (ITC) using three different starch oligosaccharides: ␣CD, M7, and glucosyl-maltotriosyl-maltotriose (GM3M3), an oligosaccharide of seven glucose units containing two ␣1,6linkages (Table 1). In addition to examining the overall binding affinities of the two WT proteins, we created a series of sitedirected mutants of each protein in which only one ligandbinding site remains active; these proteins are labeled to designate the active CBM remaining (e.g. SusF-A only indicates that the Fa domain is still active, whereas the others have been mutated). For both SusE and SusF, we also created negative controls in which all CBMs were mutated, referred to as SusE-no binding and SusF-no binding. We did not detect any binding with these negative control proteins confirming that the site-directed mutations abolished starch binding. As observed in the crystal structures, it is possible for both proteins to cluster around a single molecule of ␣CD or M7, and thus it is possible that during the course of the ITC experiment both 1:1 and 2:1 protein:ligand binding events are occurring. Therefore, because we knew the number of binding events to expect approaching saturation, we chose to fit the data to a one-site model and fix N to the number of binding sites in each protein.
Thus, our K d values reflect the relative affinity of each protein for each ligand.
Overall, SusE has a higher affinity for the three ligands compared with SusF. The Eb site displays tighter binding for ␣CD compared with M7 and GM3M3, likely due to the reduced entropic penalty of binding the geometrically constrained ligand. Many starch-binding sites only recognize 2 or 3 glucose residues and thus the lack of a true helical shape in ␣CD, which is a ring, is compensated for by the fixed geometry of the cyclodextrin (47). This is not true for ligand binding at CBM Ec. At Ec helical M7 was bound with higher affinity (K d 17.04 M) compared with ␣CD (K d 97.09 M); the unique binding site loop in Ec allows the protein to recognize much more of the starch ligand, and thus the pitch of the helix, as seen in M7 in the crystal structure, is required to maximize interactions with the protein. Unexpectedly, all three CBMs of SusF bound M7 with slightly better affinity than ␣CD, despite our observations from the crystal structure that these sites only recognize 2 or 3 glucose residues. This may suggest that they are more adept at recognizing a flexible helical segment of starch. This preference for partially "unwound" segments of starch may aid in docking the Sus complex to portions of a starch molecule that will be more accessible to the SusG amylase. SusE and SusF bind GM3M3, the weakest of all three ligands, suggesting that whereas ␣1,6-linkages are tolerated, there is unlikely to be a preference for these structures over ␣1,4-linked glucose.
SusE and SusF CBMs Contribute Differently to Binding of Insoluble Starch-The presence of multiple starch-binding sites on a single protein introduces the possibility that SusE and SusF bind longer polymers better than small oligosaccharides due to an avidity affect, in which binding at more than one site occurs simultaneously resulting in increased apparent affinity. We performed adsorption depletion experiments to determine the binding affinity of WT SusE and SusF, as well as binding site mutants of SusE and SusF, to insoluble cornstarch. The error of some of the curve fits are elevated; we attribute this to errors in using the BCA assay near the high and low limits of protein detection, as well as potential differences in nonspecific binding between replicates. We performed this assay many times while refining our final assay conditions (also performed in triplicate)  5A and Table 2). In experiments utilizing single CBM mutants of SusF (Fig. 5B), with the mutated CBM designated by an asterisk, there is a decrease in the overall affinity for starch when either CBM Fb (SusF B * ) or Fc (SusF C * ) is mutated, but no defect when Fa (SusF A * ) alone is mutated. Reciprocally, when Fa is left as the only remaining functional starch-binding site (SusF-A only; Fig. 5C), the protein has greatly reduced starch binding, and displays a similar isotherm as the SusF no binding mutant. Therefore Fa, the CBM that is unique to SusF, does not contribute to insoluble starch binding, despite its ability to bind smaller maltooligosaccharides. When the CBMs Fb or Fc alone were mutated, the K d increased by an order of magnitude over WT SusF, suggesting that these sites may work together to bind starch (Fig. 5B) Prospectus-In this report, we investigated the biochemical and structural features of SusE and SusF, two cell surface lipo-proteins within the Bt Sus complex. These proteins are extremely similar in structure, composed of an observed (SusF) or predicted (SusE) N-terminal IgSF domain, followed by two or three tandem starch-binding CBMs. The N-terminal domain of SusE could not be resolved in the crystal structure suggesting inherent flexibility in this domain. This flexibility may allow the predicted N-terminal IgSF domain to dock to SusF or another Sus protein and still permit mobility of the SusE starch-binding domains to capture starch. Earlier literature suggests that SusE is more susceptible to proteolytic cleavage in a strain lacking SusF, suggesting these proteins may interact (7). A striking difference between these two proteins is the presence of the additional CBM Fa in SusF, which may impart extra rigidity to the protein because of increased contacts with the flanking domains. Although the Fa binding site has moderate affinity for maltooligosaccharides, it is nearly devoid of insoluble starch binding.
CBMs are typically contained within a single glycoside hydrolase polypeptide or associated enzyme complex (i.e. cel-  lulosomes) and enhance accessibility to an insoluble substrate (19). Tandem CBMs in glycoside hydrolases have been shown to display an avidity affect in binding carbohydrate, whereby relatively low affinity of the individual domains is augmented severalfold due to the multivalent interactions of the protein with the substrate (20). For SusF, there is no apparent avidity advantage from the presence of tandem CBMs. Rather, it seemed that each CBM has different starch-binding characteristics, reflected in both the architecture of the starch-binding sites as well as the observed affinities for the ligands tested. In contrast to SusF, both domains of SusE are required for tight binding to insoluble starch, suggesting an avidity affect. The CBM Ec binding site has an additional loop that is likely responsible for its enhanced binding affinity. The structure of SusE with maltoheptaose demonstrates how a longer, single helix of amylose could interact with the Ec site, suggesting that this site, even more so than CBM Fc, may bind relaxed or denatured ␣1,4-glucans.
The precise mechanistic role of SusE and SusF in starch metabolism remains unclear, although the data presented here provide a valuable structural and biophysical perspective (Fig.  6). As mentioned above, current protein classification schemes such as Pfam include a narrow range of lipoproteins that are associated with Sus-like systems within the same families as SusE and SusF. Thus, these groups may exclude many functional or structural homologs that target other glycans, but are missed by primary sequence analysis. Consistent with this idea, one such Bt lipoprotein (BT1761) has been shown to bind specifically to ␤2,6-linked fructan (9). Moreover, we have purified two additional proteins (Bacova_04391 and Bacova_02094) from another human gut symbiont, Bacteroides ovatus, that have been implicated in metabolism of xylan and ␤-mannan, respectively. Each of these proteins binds to its predicted target glycan in a gel-retardation assay (data not shown) and the ligand-free crystal structure of Bacova_04391 has been determined by the Joint Center for Structural Genomics (PDB 3ORJ), revealing that it has an N-terminal Ig-like domain followed by two ␤-sandwich domains resembling CBMs. More work will be needed to establish how these and similar proteins interact with their target glycans, but it is probable that they are part of a diverse group of relatively unexplored glycan-binding proteins that are associated with Bacteroidetes Sus-like systems.
Blocking these two proteins from trafficking to the bacterial surface does not eliminate growth on starch, despite the fact that they contain a total of five starch-binding sites. In contrast, SusD has a lower affinity for oligosaccharides and loss of this protein results in a complete inability to grow on oligosaccharides greater than 5 glucose units (12). Thus, different proteins in Sus-like systems are likely to play different functional roles that are not necessarily dependent on how tightly they bind substrates. Given that two other starch-binding sites are present in SusG, including a CBM58 domain (14), it is possible that loss of SusE and SusF is compensated by these additional sites. With structural data in hand for all four of the Sus proteins, we are now in a position to perform this more precise level of mutagenesis and further probe the mechanism of this system. In addition, it is possible that SusE and SusF scavenge starch when it is at low concentrations or sequester it at the cell surface during hydrolysis. Either of these mechanisms would be valuable to a gut bacterium during competition in the densely populated colonic ecosystem. Regardless of their precise functional role(s), the abundance of proteins related to SusE and SusF in Bacteroidetes Sus-like systems suggests that they are fundamentally important to the fitness and survival of these symbiotic organisms. Our results here shed structural insight into understanding the role of these proteins and provide the basis for future mechanistic studies in live bacteria. In total, the four outer membrane lipoproteins in the Bt starch-utilization system contain at least nine sites that interact with starch or cleaved maltooligosaccharides. Only one of these sites (Gcat) is catalytic and present in the endo-acting amylase, SusG. The remaining eight binding sites are spread across all four lipoproteins. In the model shown, these eight sites make interactions with different regions of a single starch polymer. The nature of potential interactions between individual Sus liproteins has not been explored, nor has the stoichiometry of these proteins on the cell surface.