Crucial Roles of Single Residues in Binding Affinity, Specificity, and Promiscuity in the Cellulosomal Cohesin-Dockerin Interface*

Background: Cellulosomal cohesin-dockerin interactions show intraspecies promiscuity but interspecies specificity. Results: A combination of computations and experiments reveals single cohesin residue mutations with dramatic effects not only on binding affinity but also on specificity and promiscuity. Conclusion: Natural interspecies specificity barriers in the cohesin-dockerin interaction are easily overcome by single mutations, indicating considerable plasticity. Significance: This study sheds light on the malleability and evolvability of a high affinity interaction. Interactions between cohesin and dockerin modules play a crucial role in the assembly of multienzyme cellulosome complexes. Although intraspecies cohesin and dockerin modules bind in general with high affinity but indiscriminately, cross-species binding is rare. Here, we combined ELISA-based experiments with Rosetta-based computational design to evaluate the contribution of distinct residues at the Clostridium thermocellum cohesin-dockerin interface to binding affinity, specificity, and promiscuity. We found that single mutations can show distinct and significant effects on binding affinity and specificity. In particular, mutations at cohesin position Asn37 show dramatic variability in their effect on dockerin binding affinity and specificity: the N37A mutant binds promiscuously both to cognate (C. thermocellum) as well as to non-cognate Clostridium cellulolyticum dockerin. N37L in turn switches binding specificity: compared with the wild-type C. thermocellum cohesin, this mutant shows significantly increased preference for C. cellulolyticum dockerin combined with strongly reduced binding to its cognate C. thermocellum dockerin. The observation that a single mutation can overcome the naturally observed specificity barrier provides insights into the evolutionary dynamics of this system that allows rapid modulation of binding specificity within a high affinity background.

Interactions between cohesin and dockerin modules play a crucial role in the assembly of multienzyme cellulosome complexes. Although intraspecies cohesin and dockerin modules bind in general with high affinity but indiscriminately, crossspecies binding is rare. Here, we combined ELISA-based experiments with Rosetta-based computational design to evaluate the contribution of distinct residues at the Clostridium thermocellum cohesin-dockerin interface to binding affinity, specificity, and promiscuity. We found that single mutations can show distinct and significant effects on binding affinity and specificity. In particular, mutations at cohesin position Asn 37 show dramatic variability in their effect on dockerin binding affinity and specificity: the N37A mutant binds promiscuously both to cognate (C. thermocellum) as well as to non-cognate Clostridium cellulolyticum dockerin. N37L in turn switches binding specificity: compared with the wild-type C. thermocellum cohesin, this mutant shows significantly increased preference for C. cellulolyticum dockerin combined with strongly reduced binding to its cognate C. thermocellum dockerin. The observation that a single mutation can overcome the naturally observed specificity barrier provides insights into the evolutionary dynamics of this sys-tem that allows rapid modulation of binding specificity within a high affinity background.
What determines binding specificity, and how easily can this specificity be changed during evolution? These are fundamental questions regarding molecular interactions in biological systems. Nature manipulates binding specificity and affinity using different strategies (1) ranging from subtle changes at the level of residue point mutations in distinct regions of the interface (2) to more dramatic changes such as loop insertion or deletion (3,4). Protein binding interfaces consist in general of independent patches of networks of interacting residues (5). Such patches can perform different tasks in the interaction (6).
Anaerobic, cellulose-degrading bacteria produce a sophisticated multiprotein complex called cellulosome that is optimized for efficient degradation of cellulose (7). The different components of the cellulosome are held together by interactions between two distinct modules: cohesin modules that occur in repeats on the scaffoldin protein and dockerin modules that are attached to hydrolases or to additional scaffoldins (Fig. 1A). The cohesin-dockerin interaction (see Fig. 1B) is characterized by very high affinity (8,9). The small ϳ70-residue dockerin module contains two F-hand motif repeats (10,11) that allow binding of its cohesin partner in two symmetric orientations (see Fig. 1, C and D) (12,13). The cohesin module folds into a ␤ sandwich and binds to dockerin using one of its ␤ sheets; binding specificity is achieved by the loops at the periphery of this sheet (12)(13)(14)(15)(16). Cohesin-dockerin interactions can be classified into three general types according to their sequence similarity (7). In C. thermocellum and C. cellulolyticum, enzyme-associated dockerins are in general grouped as type I, whereas dockerins that participate in attachment of scaffoldins to the cell belong to the type II family (represented in Fig. 1A as CohI-DocI and CohII-DocII, respectively). Within a given species, dockerin and cohesins may interchangeably bind (within the same family type), thereby increasing cellulosomal heterogeneity (notably its enzyme content). In contrast to this binding promiscuity for different cohesin modules of the same species, interactions between dockerin and cohesin modules from different species have not been often observed (Ref. 17; for an exception, see e.g. Sakka et al. (18)).
The cohesin-dockerin interaction is an ideal model system for the assessment of minimal changes needed for modulation of binding promiscuity and specificity. How many mutations are needed and where are they located? Rational manipulation of the cohesin-dockerin interaction is a precondition toward more elaborate designs of multimolecular cellulosomes of specific architecture and composition in the future (19 -21).
Previous studies identified specificity-determining residues in the dockerin module based on sequence conservation within and across different species (8,22). In a gene-swapping experiment, Nakar et al. (23) were able to switch the binding specificity of a cohesin from C. cellulolyticum to dockerin of C. thermocellum by the replacement of only three residues. In their large scale assessment of correlated mutations, Halperin et al. (24) harnessed the ample information about intraspecies binding promiscuity and interspecies binding specificity in the cohesindockerin interaction to identify interface residues of this interaction in a precise manner.
Here we used a structure-based approach to determine which residues contribute to the high affinity observed in this interaction and which in turn are important for binding specificity. We applied a combination of computational design and binding experiments to identify crucial features in the C. thermocellum cohesin that determine the structural and physical bases for binding affinity and specificity in the type I cohesindockerin interface. We characterized residues at the cohesindockerin interface by computational modeling using the Rosetta molecular modeling suite (25) and validated these predictions using indirect ELISA (iELISA) 4 specifically developed by us to measure effects on binding of high affinity interactions (26,27). Evaluation of the effect of these mutations on binding by a number of additional computational as well as experimental approaches allowed us to provide a robust picture of the main determinants of binding affinity and specificity of this interaction. Our study identified two types of hot spot residues in C. thermocellum cohesin: affinity hot spots such as Leu 83 in the conserved hydrophobic patch of the cohesin-dockerin interface contribute significantly to binding affinity, whereas the specificity hot spot Asn 37 in the hydrogen bond network in the C. thermocellum cohesin-dockerin interface plays a crucial role in determining binding specificity.

Computational Protocols
Cohesin and Dockerin Modules and Structures-Throughout this study, we refer to the C. thermocellum complex between the second cohesin module in scaffoldin A (residues 183-322) and the dockerin connected to endo-1,4-␤-xylanase Y (Xyn10B; residues 733-788) (Protein Data Bank code 1OHZ (12)) and to the C. cellulolyticum complex between the first cohesin module on scaffoldin C (residues 277-427) and the dockerin connected to endoglucanase A (Cel5A; residues 410 -472, A16S/L17T mutant) (Protein Data Bank code 2VN6 (16)). Residue numbering refers to these structures. In this study, we report modeling studies based on the dockerin orientation that positions recognition residues Ser 45 /Thr 46 and Ala 47 /Phe 48 for C. thermocellum and C. cellulolyticum, respectively, at the interface. Structures of the inverse, symmetrical binding mode show a very similar interface (C. thermocellum, Protein Data Bank code 2CCL (13); C. cellulolyticum, Protein Data Bank code 2VN5 (16)) and provide similar results (data not shown).
Correction of an Incorrectly Fitted Side Chain at Cohesin Position Asn 37 in the Crystal Structure of the C. thermocellum Cohesin-Dockerin Complex-Electron density maps do not allow distinction between the two different possible planar side chain orientations. The positions of asparagine side chain O ␦1 and N ␦2 atoms (as well as glutamine side chain atoms O ⑀1 and N ⑀2 ) are therefore usually defined based on optimization of the surrounding polar environment and the optimal satisfaction of hydrogen bonds. The NQ-Flipper protocol (28) suggests that Asn 37 in structure 1OHZ is misfitted. Therefore, we used in all computations a starting structure of the C. thermocellum cohesin-dockerin interaction in which the side chain of Asn 37 had been flipped for optimal positioning of its side chain hydrogen bond donor and acceptor as also suggested previously (29).
Computational Alanine Scanning and Interface Design-The structure-based computational analysis of the cohesin-dockerin interaction was performed using the Rosetta modeling framework in which structure optimization and sequence design are performed using a stochastic search based on Monte Carlo with minimization and an energy function dominated by tight and clash-free packing, burial of hydrophobic residues, and satisfying hydrogen bonds (25, 30 -32). We used Rosetta version 2.3.0 throughout this study, i.e. Rosetta revision 12795 and Rosetta database revision 21964, unless mentioned otherwise. Rosetta is available free of charge to academic users.
We applied the Rosetta interface mode for alanine scanning to identify interface residue hot spots as well as for the design of sequences with modified affinity (33). The contribution of different interface residues to binding was evaluated by mutation to alanine and measuring the effect on binding: ⌬⌬G bind ϭ ⌬G bind mutant Ϫ ⌬G bind WT where ⌬G bind ϭ G complex Ϫ G free protein A Ϫ G free protein B . A threshold of ⌬⌬G bind Ն1.0 kcal/ mol was used to select putative hot spot residues (as in e.g. Refs. 31 and 34). The ⌬⌬G predictions were calculated by introducing minimal changes in the structure as we would not expect major backbone changes (the evaluated residues and their neighbors are restricted: they lie within the ␤-sheet of the cohesin or within the helix or calcium binding loop of the dockerin). The starting structure was first optimized by minimizing the complex structure. Upon mutation, neighboring side chain atoms were minimized, but side chains were not repacked and therefore not allowed to rearrange significantly.
The command line consists of: where ϽINPUT_PDBϾ can be an experimentally solved structure or a model obtained by docking (see below), -interface invokes the interface mode, the mutations to model (e.g. N37D and N37A/D39A) are specified in the file defined by -mutlist, -output_structure writes the structures modeled during the simulation into Protein Data Bank format files, and the results specifying ⌬G and energy values of the wild type and mutants are written to the output file specified by -intout. Torsion angles (both backbone and side chain) and rigid body orientation were minimized prior to the modeling of the mutation.
We compared this protocol (Rosetta 2.3) with a parameterized protocol for modeling effects of point mutations (termed here Rosetta 3.0* and described in detail in Kellogg et al. (34) as protocol16). For these calculations, we used Rosetta revision 34507 and Rosetta database version 40221. The main differences between these two protocols is that the latter allows the repacking of all interface side chain residues together with optimized weights of the scoring function and a combination of soft repulsive repacking followed by standard, hard repulsive minimization of both backbone and side chain atoms under constraints that tether the structure to the starting conformation.
Protocol16 consists of the following command line: The resfile supplied allows repacking of all residues, including the input side chain conformation, choosing from a rotamer library with extended 1 and 2 angle sampling. As an example, the resfile for mutation N37A would contain the following: In short, this protocol first repacks the input structure and then performs the mutation of interest. Starting both from the wild-type and the mutated structure, 50 independent runs of minimization of all torsion and rigid body degrees of freedom are performed for each. ⌬⌬G is then calculated as the difference in energy between the minimal energy conformation of the mutant and the wild type (among the 50 simulations of each). In addition, we compared the Rosetta-based protocols with the following other approaches: 1) FoldX (35) version 3.0 Beta3 run locally using default values; 2) EPPI-2 Orbit interface energy function EPPI-2 (36) for which two different implementations of EPPI-2 were used: EPPI-2 values were transformed to optimize correlation to a large set of 404 experimental ⌬⌬G bind values (Orbit1) or a restricted set of 53 designed mutations (Orbit2) (corresponding to Fig. 3, C and F, in Ref. 36, respectively); 3) Hunter (37); and 4) Concoord/PBSA (38).
Modeling of Non-cognate Cohesin-Dockerin Interactions Using RosettaDock-A model of the non-cognate interaction was generated using as partners the C. thermocellum cohesin structure from Protein Data Bank structure 1OHZ (12) where the residue Asn 37 was truncated to alanine (N37A) and the C. cellulolyticum dockerin structure from Protein Data Bank codes 2VN5 and 2VN6 (16).
We used RosettaDock (with off-rotamer minimization of the side chain conformations (30,39,40)) to model the structure of non-cognate cohesin-dockerin pairs based on the monomer structures taken from the cognate C. thermocellum and C. cellulolyticum complex structures (see above). We considered two possible orientations for modeling the complex between C. thermocellum cohesin mutant N37A and the C. cellulolyticum dockerin since either the first (Ala 16 -Leu 17 ) or second (Ala 47 -Phe 48 ) dockerin recognition motif can interact with the cohesin module (Protein Data Bank codes 2VN5 and 2VN6, respectively (16)) because of the dual mode of cohesin-dockerin binding. In our calculations, the second orientation predicted better ⌬G values (Ϫ20.5 versus Ϫ17.0 Rosetta energy units) and was therefore chosen for subsequent analysis.
Docking was performed as described previously (40). The docking command line consists of the following: -dock_pert 3 8 8 will start the refinement protocol from a structure that has been slightly perturbed to sample the local energy landscape: random moves according to Gaussian distributions with 3-Å (8-Å) standard deviation (S.D.) are performed on translations along the axis that connects the centers of mass (and the two perpendicular axes), and according to a Gaussian distribution with 8°S.D. on rotations around the three Euler angles (30), -dock_mcm indicates the docking protocol that involves Monte Carlo sampling with minimization of rigid body orientation, -dock_rtmin allows sampling of off-rotamer side chain conformations, -unbound_rot adds for each position the side chain conformation encountered in the free (i.e. starting) monomer conformation to the rotamer library and assigns it minimal energy to bias toward this conformation, -ex1 and -ex2aro allow the inclusion of additional rotamers (ϮS.D. for 1 for all amino acids and for 2 of aromatic residues), -dock_s-core_norepack will just rescore the starting structure rather than repacking and minimizing it, -nstruct 1000 will generate 1000 independent models, and -scorefile XXXX.score indicates the name of the score output file.
In a preceding step, the free monomers were prepacked to remove any internal clashes. In this step, the same sampling of side chain conformations was allowed. The command line consists of the following: where -prepack_rtmin allows sampling of off-rotamer side chain conformations, and the other parameters are as described above.

Experimental Protocols
Expression Plasmids-Plasmid cassettes of the fusion proteins xylanase-fused dockerin (XynDoc) and carbohydratebinding module-borne cohesin (CBM-Coh) were produced as described earlier (29,41,42). We used the cohesin constructs coh2-CBM of C. thermocellum scaffoldin CipA (the second cohesin module and following cellulose-binding module) and miniCipC of C. cellulolyticum scaffoldin C (CBM, hydrophilic domain, and the first cohesin module) and the dockerin constructs C. thermocellum XynDocS (C. thermocellum Cel48S dockerin) and C. cellulolyticum XynDocA (C. cellulolyticum Cel5A dockerin). Mutations of cohesin residues were performed by site-directed mutagenesis using the QuikChange kit (Stratagene). The mutations were verified by DNA sequencing. Mutant cohesin fragments were restricted with BamHI and XhoI enzymes and ligated into pET28a, which was also digested with the same enzymes. The final construct was verified by sequencing.
Expression and Purification of Proteins-Wild-type and mutant CBM-Coh constructs were expressed and purified by affinity chromatography on a cellulose resin as described by Barak et al. (41). XynDocs were expressed and purified as described previously (41) with the following changes. The proteins were expressed in BL21 in LB medium supplemented with 50 g/ml kanamycin. The culture was grown at 37°C until it reached an A 600 of ϳ0.6 -0.8 and then induced with 1 mM isopropyl 1-thio-␤-D-galactopyranoside for 3 h at 37°C. The dockerins were purified on a nickel-nitrilotriacetic acid column as described, without heat treatment. Purity of the CBM-Coh and XynDocs proteins was estimated by analytical gel filtration using a Superdex 200 column. Protein concentration was determined by absorption.
iELISA-The relative binding of the cohesin mutants was estimated using the iELISA-based method described previously in detail (26) and briefly below. Different concentrations (1 pM-1 M) of cohesin mutants were incubated with 100 pM wildtype Xyn-dockerin for 1 h at 37°C in binding buffer (TBS, 10 mM CaCl 2 , 0.05% Tween 20, 2% BSA). Next, 100 l of the mixture was transferred to the 96-well MaxiSorp (Nunc A/S, Roskilde, Denmark) plate coated with wild-type cohesin for incubation for 15-30 min at 37°C. Preformed complexes were then washed with washing buffer (TBS, 10 mM CaCl 2 , 0.05% Tween 20) followed by 1-h incubation with rabbit-anti Xyn, washing, and another 1-h incubation with HRP-conjugated goat antirabbit antibodies.
The relative binding of the cohesin mutants to C. cellulolyticum dockerin was measured as in previous studies except for a few modifications. 300 pM C. cellulolyticum dockerin was incubated with different concentrations (1 pM-1 M) of cohesin mutants, and plates coated with the C. thermocellum cohesin N37A mutant were used. This allows presentation of differences in the binding affinity of C. cellulolyticum dockerin to WT C. cellulolyticum, WT C. thermocellum, and mutant C. thermocellum cohesin in one plot.
The obtained binding data were analyzed using GraphPad Prism (version 5.00 for Windows; GraphPad Software, San Diego, CA). Wild-type cohesin binding was used to normalize the experimental scale, and all mutants were standardized according to the wild type. The results were fitted to a sigmoidal dose-response curve (43), and changes in free energy of binding (⌬⌬G bind_exp ) were calculated relative to the wild type according to the following equation: Standard errors of the mean of logarithmic values of IC 50 did not exceed the half-log value (as suggested by Ref. 43) with the exception of measures for the L83S mutant (see Table 1). Protein Cellulose Microarray-Protein cellulose microarray experiments were performed as described previously (17). In short, WT C. thermocellum, WT C. cellulolyticum, and mutant N37A C. thermocellum cohesin modules were printed in rows of 2-fold diluted concentration each. The slides were incubated with either C. thermocellum XynDocS or C. cellulolyticum Xyn-DocA and visualized using anti-xylanase antibody labeled with Cy5.

Results
Computational Alanine Scanning Identifies Interface Residue Hot Spots in Two Distinct Patches at the C. thermocellum Cohesin-Dockerin Interface-The structure of the C. thermocellum cohesin-dockerin interface contains a hydrophobic patch that is conserved in both C. thermocellum and C. cellulolyticum cohesin-dockerin interactions (Figs. 1C and 2A) as well as an extensive network of hydrogen bonds at the center of the interface ( Fig. 1D and Fig. 3A). The C. thermocellum dockerin residues Ser 45 and Thr 46 located in the conserved binding motif of the second F-hand constitute a central part of this network (Fig.  3B). In the corresponding C. cellulolyticum dockerin, this binding motif is replaced by hydrophobic residues Ala 47 and Phe 48 ( Fig. 1D and Ref. 16).
To locate interface residues that play a critical role in binding, we first identified putative interface hot spot residues in the C. thermocellum cohesin-dockerin interface with computational alanine scanning using Rosetta (version 2.3). Binding hot spot residues (calculated ⌬⌬G bind Ն1.0 kcal/mol) are located in 1) the hydrophobic patch (cohesin Leu 83 and dockerin Leu 22 ), 2) the network of hydrogen bonds (cohesin Asn 37 , Asp 39 , and Glu 131 and dockerin Ser 45 and Thr 46 ), and 3) other regions of the interface (e.g. cohesin Tyr 74 and a conserved intermolecular salt bridge between cohesin Glu 86 and dockerin Arg 53 ). Table 1 summarizes predicted and experimental ⌬⌬G values evaluated in this as well as previous studies.
The Hydrophobic Patch Plays an Important Role in C. thermocellum Cohesin-Dockerin Binding-The hydrophobic patch is centered around the conserved C. thermocellum cohesin residue Leu 83 (Figs. 1C and 2A). The strong conservation of this hydrophobic and exposed residue suggests an important functional role in the interaction with dockerin partners. To investigate the contribution of this residue to binding affinity, we replaced Leu 83 with a smaller residue that creates a void in the hydrophobic interface patch and is therefore predicted to significantly affect binding (mutations L83A and L83S; see Table  1). We used iELISA (26,27) to measure the effect of these mutations on the affinity of the C. thermocellum cohesin-dockerin interaction (see "Experimental Procedures"). The IC 50 of the wild type C. thermocellum cohesin-dockerin interaction was 1 nM, whereas the IC 50 values of the L83A and L83S mutants were 123 and 257 nM, respectively, corresponding to a decrease in binding free energy ⌬⌬G of 3.0 and 3.4 kcal/mol, respectively ( Fig. 2B and Table 1). Both mutations indeed significantly impair binding. In compliance with this result, the corresponding residue in C. cellulolyticum cohesin, Leu 87 , was shown previously to significantly decrease binding upon mutation to alanine (16). These experimental data validate the crucial role of Leu 83 in the interaction of the C. thermocellum cohesin with its FIGURE 1. Overview of the cellulosome and the cohesin-dockerin complex in C. thermocellum. A, scheme of the cellulosome organization. The primary scaffoldin (yellow) contains nine type I cohesin modules (CohI), a single cellulose-specific CBM, and a type II dockerin module (XDocII). Each type I cohesin binds a type I dockerin module (DocI). This module occurs in a variety of cellulases and other carbohydrate-active enzymes (shades of blue and green). The primary scaffoldin (yellow) is attached to one of several cell-anchoring scaffolds via interaction between type II dockerin (XDocII) and type II cohesin (CohII). The anchoring scaffoldins (pale orange) contain type II cohesin modules and a surface layer homology (SLH) module that is connected to the cell surface. B, structure of the type I cohesin-dockerin interface (cohesin, red; dockerin, blue; Protein Data Bank code 1OHZ (12)). The hydrophobic patch at the interface is shown in spheres; the residues involved in the hydrogen bond network are depicted as sticks (atoms in this and subsequent figures are colored in Corey-Pauling-Koltun). C and D, sequence conservation at the cohesin-dockerin interface. The hydrophobic patch at the cohesin-dockerin interface is conserved across species (C). In contrast, the network of hydrogen bonds is conserved within C. thermocellum (Ct) but replaced by hydrophobic residues in C. cellulolyticum (Cc) cohesin-dockerin interfaces (D). Multiple sequence alignments are shown for different cohesin repeats (red) and dockerins from different enzymes from C. thermocellum and C. cellulolyticum (blue). The dual binding mode is shown in cartoon representation. The first dockerin segment is colored dark blue and the second is colored light blue; calcium ions are shown as green spheres. cognate dockerin and highlight the importance of the hydrophobic patch at this interface.
Contributions of the Polar Patch to Cohesin-Dockerin Binding Affinity-The network of hydrogen bonds at the interface of the C. thermocellum cohesin-dockerin complex is centered on the specificity-determining dockerin residues Ser 45 and Thr 46 and on cohesin residues Asn 37 , Asp 39 , and Glu 131 (Figs. 1D and 3, A  and B). Our structure-based analysis and energy calculations suggest that mutations to alanine at these positions will reduce binding affinity significantly (⌬⌬G Ͼ 1.0 kcal/mol) by disrupting the network of hydrogen bonds and by modifying the electrostatic attraction between the overall negatively charged cohesin and positively charged dockerin modules (Table 1).
Using iELISA experiments to measure the binding of the cohesin single mutants N37A, D39A, and E131A to wild-type C. thermocellum dockerin, we confirmed that D39A and E131A mutations indeed significantly impair binding to the C. thermocellum dockerin (⌬⌬G Ͼ4.0 kcal/mol for D39A and 1.8 kcal/ mol for E131A, respectively). Surprisingly, we observed no such effect for N37A ( Fig. 3C and Table 1). In addition, the N37A mutation was found to reduce the effect of the E131 mutant in the polar patch: the double mutant N37A/E131A (⌬⌬G ϭ 1.0 kcal/mol) had less impact on binding than the single mutant E131A (⌬⌬G ϭ 1.8 kcal/mol) (Fig. 3C and Table 1).
We next investigated two additional mutations at the polar patch, N37D and D39N. These mutations involve only minor changes of one side chain atom but result in a change of the overall charge and affect the network of hydrogen bonds (Fig.  3B). Our calculations predict that N37D will be detrimental to binding, whereas D39N will not affect binding affinity (Table 1). iELISA experiments confirmed the predicted detrimental effect of the N37D mutant on binding (⌬⌬G ϭ 2.5 kcal/mol). Contrary to our predictions, however, the iELISA experiment showed an even stronger reduction of binding for the D39N mutant (⌬⌬G Ͼ 4 kcal/mol) (Fig. 3D and Table 1). Together, these two mutations, N37D and D39N, highlight the sensitivity of this interface to both addition and removal of negative charges.

Structure-based ⌬⌬G Prediction Protocols Partially Succeed in the Identification of Interface Hot Spots at the Cohesin-Dockerin
Interface-For the mutations tested by iELISA experiments for their effect on binding in this study, we found an overall good agreement of predicted and experimentally measured effects except for the mutations N37A and D39N (Table 1). Why then does our model fail to accurately capture the effect of these two mutations? Are other protocols more successful? We recalculated predicted effects on binding using a range of other published and available approaches, namely FoldX (35), Hunter  Fig. 1D for conservation of hydrogen bond network). B, schematic view of the network highlighting the central cohesin residues involved in the network, Asp 39 , Asn 37 , and Glu 131 . These residues contact the known specificity-determining C. thermocellum dockerin residues Ser 45 and Thr 46 . C, iELISA measurements of the effect on binding of C. thermocellum cohesin mutants at positions Asn 37 , Asp 39 , and Glu 131 to C. thermocellum dockerin. Removal of charge in E131A and even more so in D39A impairs binding to C. thermocellum dockerin. In contrast, N37A does not significantly influence binding but rather dampens the effect observed for E131A (and to a lesser extent also for D39A) as manifested by the corresponding double mutants N37A/E131A and N37A/ D39A. D, iELISA measurements of the effect of binding of C. thermocellum cohesin mutants N37D and D39N to C. thermocellum dockerin. Changes in charge and in the directionality of the network of hydrogen bonds impair the interaction significantly. Cc, C. cellulolyticum. See Fig. 2 legend for more details about the iELISA measurements. (37), two different versions of the Orbit function optimized for interface prediction EPPI-2 (36) (termed here Orbit1 and Orbit2), and CC/PBSA (38) (see "Experimental Procedures"). In addition, we repeated the prediction with a more recent Rosetta protocol calibrated specifically for the prediction of effect of mutations on protein stability (termed here Rosetta 3.0*; protocol16 in Kellogg et al. (34)). The results of these different approaches are summarized in Fig. 4. In concordance with our results described above, all approaches identify the strong effect on binding upon Leu 83 mutation to serine or alanine. However, for mutations of polar residues Asp 39 and Asn 37 , the agreement among the protocols on the effect on binding is lower, and none can correctly describe the whole set. Hunter does not predict a significant effect on binding for any of these mutations (including N37A). The lack of destabilization by N37A is missed by all other methods but Rosetta 3.0*. The latter predicts that the energy contributed by the hydrogen bonds formed by N37A is small due to non-optimal geometry (compared e.g. with the hydrogen bonds formed by Asp 39 ) and therefore cannot compensate for the solvation penalty of burying Asn 37 . Therefore, mutation to alanine at this position will not significantly affect binding affinity. The strong effect of D39N is also missed by most of the other approaches (including CC/PBSA that is aimed at accurately modeling electrostatic effects) except for Orbit1. This protocol, however, predicts a similar effect for all mutations, and slight modification of the threshold used to define a hot spot (1 kcal/mol) will strongly affect the results.

. The network of hydrogen bonds in the cohesin-dockerin interface of C. thermocellum contains cohesin residues responsible for both affinity (Asp 39 and Glu 131 ) and specificity (Asn 37 ). A, detailed structural view of the polar network of hydrogen bonds that is conserved in C. thermocellum (Ct) cohesin-dockerin interactions. Hydrogen bonds (green dotted lines) mediated by interface residues (sticks; colored in Corey-Pauling-Koltun) and water molecules (cyan) are shown (see
The N37A Mutation of the C. thermocellum Cohesin Establishes a New Interaction with the Non-cognate C. cellulolyticum Dockerin-We reconfirmed that the C. thermocellum cohesin mutation N37A does not affect binding to its cognate dockerin by independent evaluation of the binding ability of both wildtype C. thermocellum cohesin and C. thermocellum cohesin N37A to C. thermocellum and C. cellulolyticum dockerin using a cellulose-binding microarray assay (17). Surprisingly, the assay revealed that the N37A mutant also binds to non-cognate C. cellulolyticum dockerin unlike the wild-type C. thermocellum cohesin (Fig. 5A). Thus, the cross-species binding barrier observed for wild-type C. thermocellum and C. cellulolyticum cohesin-dockerin pairs is partially overcome in the C. thermocellum cohesin mutation N37A because it binds promiscuously to both C. thermocellum and C. cellulolyticum dockerins.
We further confirmed this non-cognate binding by the corresponding iELISA assay. We compared the affinity of C. cellulolyticum dockerin to the promiscuous C. thermocellum cohesin mutant N37A with its affinity to both its cognate (C. cellulolyticum) and non-cognate (C. thermocellum) wild-type cohesin modules (Fig. 5B). The results indicate that in strong

Effect of C. thermocellum cohesin mutations on type I cohesin-dockerin binding affinity
Mutations in both the hydrophobic and hydrophilic interface patches (Figs. 2 and 3) can significantly reduce binding. Predicted ⌬⌬G values were calculated based on the Protein Data Bank structure 1OHZ (12) using the Rosetta interface protocol (33). Predicted hot spots (mutations to alanine with ⌬⌬G Ͼ1.0 kcal/mol) within the interface patches were experimentally validated by iELISA. Fig. 4 summarizes the predicted effect on binding of mutations as calculated by additional computational protocols.  contrast to wild-type C. thermocellum cohesin, which does not bind the C. cellulolyticum dockerin, C. thermocellum N37A mutant cohesin indeed binds to C. cellulolyticum dockerin. Still, C. thermocellum N37A mutant cohesin bound with lower affinity to C. cellulolyticum dockerin than the cognate wildtype C. cellulolyticum cohesin (⌬⌬G ϭ ϩ1.3 kcal/mol; Fig. 5B and Table 2). We conclude from these two experiments that mutation of a single amino acid residue in the C. thermocellum cohesin creates a promiscuous cohesin module that can overcome the specificity barrier and bind also to C. cellulolyticum dockerin.

Mutation
We note that in this iELISA experiment, to measure the amount of non-bound dockerin, the preformed complexes were incubated with C. thermocellum N37A cohesin-coated plates, rather than with C. cellulolyticum wild-type cohesin. This optimized our ability to distinguish between the affinities of the three different C. thermocellum cohesin molecules to C. cellulolyticum dockerin (see "Experimental Procedures" and "Discussion").
A structural model of the C. thermocellum cohesin N37A mutant bound to its non-cognate C. cellulolyticum dockerin (Fig. 5C) suggests how the N37A C. thermocellum cohesin mutant could accommodate this hydrophobic motif region in C. cellulolyticum dockerin. In the cognate C. cellulolyticum interaction, dockerin specificity motif residue Phe 48 points into the interface to interact with the side chains of Ala 129 and Lys 137 and the backbone of Met 135 of C. cellulolyticum cohesin (Fig.  1D). Conversely, in the non-cognate interaction, Phe 48 points in a different direction toward a newly created hydrophobic patch formed by C. thermocellum cohesin residue Leu 129 and C. cellulolyticum dockerin residues Ile 52 and Asn 40 (Fig. 5C). In the C. cellulolyticum cohesin-dockerin interaction, Ile 52 is exposed and only partly covered by dockerin residue Lys 55 ( Fig.  5D and Ref. 16), whereas in the model of interaction between the C. thermocellum cohesin N37A mutant and the C. cellulolyticum dockerin, Lys 55 moves away to the solvent to make room for C. thermocellum cohesin Leu 129 (Fig. 5C).
The C. thermocellum Cohesin N37L Mutation: Switching Binding Specificity-If a single mutation overcomes the species barrier and allows cohesin to bind to non-cognate dockerin, how difficult would it be to keep this new interaction and abolish the original cognate interaction, i.e. to create a switch in binding specificity? We designed two mutants, single mutant N37L and double mutant N37A/D39A, based on the solved structures of C. thermocellum and C. cellulolyticum cognate cohesin-dockerin complexes (Protein Data Bank codes 1OHZ (12) and 2VN6 (16), respectively). Although the large and polar asparagine side chain at position 37 is suited for its polar environment in the C. thermocellum cognate complex, it would not fit the corresponding hydrophobic environment in the C. cellulolyticum complex, which could be the reason for its failure to bind to non-cognate C. cellulolyticum dockerin. The mutant N37L in turn is predicted to prefer the hydrophobic non-cognate C. cellulolyticum dockerin over the polar cognate C. thermocellum dockerin (Fig. 6A). Experimental validation of this prediction with the iELISA assay confirmed that indeed the FIGURE 5. Binding promiscuity of the C. thermocellum cohesin N37A mutant. The N37A mutation allows promiscuous binding of the C. thermocellum (Ct) cohesin both to its cognate C. thermocellum dockerin and non-cognate C. cellulolyticum dockerin. A and B, experimental validation of non-cognate binding. A, a protein cellulose microarray assay reveals that although as expected C. thermocellum dockerin binds only to C. thermocellum cohesins (upper panel) and C. cellulolyticum (Cc) dockerin binds only to C. cellulolyticum cohesins (lower panel) the promiscuous C. thermocellum cohesin N37A mutant binds both to its cognate C. thermocellum dockerin and to the noncognate C. cellulolyticum dockerin (red boxes). Each row of dots represents a 2-fold dilution; the bottom row contains a CBM-Xyn location marker. B, confirmation of cellulose binding microarray results by iELISA. The C. thermocellum N37A cohesin mutant can indeed bind to both its cognate (C. thermocellum) and non-cognate (C. cellulolyticum) dockerins. C. thermocellum N37A cohesin mutant-covered plates were used in this assay. C and D, structural models of the non-cognate and cognate interactions with C. cellulolyticum dockerin. C, model of the non-cognate interaction of C. cellulolyticum dockerin with the C. thermocellum N37A cohesin mutant (generated using Roset-taDock; see "Experimental Procedures"). D, C. cellulolyticum dockerin interacting with its cognate wild-type C. cellulolyticum cohesin (Protein Data Bank code 2VN6 (16)). While in the C. cellulolyticum cohesin-dockerin interaction the C. cellulolyticum dockerin motif residue Phe 48 points toward the cohesin (D), the model of non-cognate interaction suggests that Phe 48 points into a hydrophobic pocket formed by C. cellulolyticum dockerin residue Ile 52 and C. thermocellum cohesin residue Leu 129 , displacing dockerin residue Lys 55 to the solvent (C). Phe 48 is shown as spheres, and residues predicted to interact with Phe 48 in the non-cognate interaction are depicted as sticks and labeled and colored in red and blue for cohesin and dockerin, respectively. See Fig. 2 legend for more details about the iELISA assay in B. N37L C. thermocellum cohesin mutant barely binds to the cognate C. thermocellum dockerin anymore (⌬⌬G Ͼ 4 kcal/mol compared with WT C. thermocellum cohesin; Fig. 6B and Table  1). Instead, it now binds to the non-cognate C. cellulolyticum dockerin (⌬⌬G ϭ 1.7 kcal/mol compared with WT C. cellulolyticum cohesin) with slightly lower affinity than that of N37A (⌬⌬G ϭ 1.3 kcal/mol; Fig. 6C and Table 2). The double mutant N37A/D39A shows a similar binding pattern to C. cellulolyticum dockerin as N37L: impaired binding to its cognate C. thermocellum dockerin (⌬⌬G Ͼ 4 kcal/ mol; Table 1) and new binding to its non-cognate C. cellulolyticum dockerin (⌬⌬G ϭ 1.5 kcal/mol; Fig. 6, D-F, and Table 2). In this case, a mutation that abolishes the original binding (D39A) was combined with a mutation that creates new, noncognate binding (N37A), resulting in a mutant that lost its strong original binding ability but gained binding ability to a new, non-cognate partner. Note, however, that for both the N37L and the N37A/D39A mutants that change their binding specificity from C. thermocellum dockerin to C. cellulolyticum dockerin, preferences of the dockerins to cohesin have not changed (Fig. 6B): C. cellulolyticum dockerin still shows preferential binding to its cognate C. cellulolyticum cohesin rather than the non-cognate C. thermocellum cohesin mutants.

Discussion
In this study, we used a structure-based approach to study binding specificity and promiscuity in the cohesin-dockerin interaction. Our main finding is that the balance between specificity and promiscuity can be controlled by a single residue. We discuss this easy disruption of specificity in light of other studies on protein binding specificity and affinity and its evolution and identify similarities and differences both at functional and Structural models of C. cellulolyticum (Cc) dockerin bound to C. thermocellum (Ct) cohesin mutants N37L (A) and N37A/D39A (D) and binding assays to C. thermocellum (B and E) and C. cellulolyticum (C and F) dockerin are shown. Both the N37L and N37A/D39A mutants exhibit reduced binding to the cognate C. thermocellum cohesin (B and E) and increased binding to non-cognate C. cellulolyticum cohesin (C and F). In contrast, the promiscuous C. thermocellum cohesin mutant N37A binds to both C. thermocellum and C. cellulolyticum dockerin, whereas the C. thermocellum cohesin mutant D39A does not bind to any of the dockerins. Binding to C. cellulolyticum dockerin was measured using a C. thermocellum cohesin N37A coating. See Fig. 2 legend for more details about the iELISA measurements. structural levels. Finally, we relate to challenges and successes in current, state-of-the-art modeling protocols and experimental assays that are aimed at the accurate characterization of binding affinity and specificity.

Binding Affinity, Specificity, and Promiscuity Is Easily Modulated by Single Mutations even in High Affinity Interactions-
The cohesin-dockerin interaction presents a special case of a high affinity interaction whose binding promiscuity and specificity can strongly be affected by one single mutation. Our computational results, confirmed by two independent experimental assays, demonstrate the crucial and central role that residue Asn 37 in the C. thermocellum cohesin can play in binding specificity. Although wild-type C. thermocellum cohesin binds only to C. thermocellum dockerin, mutation N37A abolishes this specificity and extends to non-cognate C. cellulolyticum dockerins, and mutation N37L completes a specificity switch toward C. cellulolyticum dockerin. Residue Asn 37 can thus be termed a binding specificity hot spot. Interestingly, mutation N37A also has stabilizing effects on D39A and E131A mutations in the polar patch of the interface ( Fig. 3C and Table 1), indicating that specificity comes at a price of increased sensitivity to mutations (i.e. the interaction with dockerin is less sensitive to mutations for the promiscuous mutant N37A than for the specific wild-type cohesin).
The new non-cognate binding can be explained in light of the distinct features of C. thermocellum and C. cellulolyticum dockerins: the polar binding motif of C. thermocellum dockerin in the second F-hand contains Ser 45 and Thr 46 , whereas in C. cellulolyticum dockerin, they are replaced by Ala 47 and Phe 48 at corresponding positions (Fig. 1D). The accommodation of the Phe 48 side chain into an existing hydrophobic patch (Fig. 5C) without the need of further evolutionary changes of the interface highlights the preorganized plasticity of interfaces, similar to what has been observed in other experiments of evolution of new binding specificities (2).
Rather than affecting binding affinity on its own, the N37A mutation attenuates the effect on binding affinity of additional mutations. Consequently, rather than an interface binding hot spot, this residue functions as an interface specificity hot spot. We have observed a similar behavior of a cohesin residue in the type III cohesin-dockerin interaction that upon mutation increased binding and attenuated effects by other positions at the interface. 5 Thus, mutations that do not impair binding but rather extend the range of binding partners to dockerins from different species (i.e. binding to non-cognate C. cellulolyticum dockerin) can readily appear during evolution.
The dramatic effect on binding specificity of one single mutation as reported for the cohesin-dockerin interaction studied here contrasts with the robustness that has been reported for other systems such as the colicin-immunity protein interaction (6). Reducing or switching specificity in the colicin-immunity protein interaction involves a number of mutations that were found only in an extended in vitro evolution experiment (2) but not identified by computation (44,45). In contrast, the cohesindockerin interaction shows a much simpler picture in which major effects on specificity may be achieved by a single residue as was demonstrated previously on the dockerin module (8) and is shown here on the cohesin module.
It therefore seems that functional rather than biophysical constraints have shaped the differential plasticity of these two high affinity associations. The interaction between cohesin and dockerin plays a very different biological role compared with the interaction between colicin and its immunity protein. The binding of the immunity protein to the bacterial endonuclease is crucial as failure to do so will result in active nuclease molecules within the cell and lead to bacterial death. Therefore, this interaction has evolved to be robust to strong effects on binding by single mutations. Consequently, crossing the strain-specific barriers would be difficult as this would involve mutations that reduce binding affinity at the price of promiscuity.
In contrast, we speculate that the cohesin-dockerin interaction within a species might have evolved toward promiscuity to facilitate extensive diversity in the hydrolase enzyme composition of cellulosome subpopulations (because the cohesin module repeats can each bind to a range of different dockerin modules, each connected to a distinct hydrolase). This platform for diversity can be fine-tuned on demand toward maximal degradation efficiency of specific cellulosic substrates. Expression of specific hydrolases is regulated at the transcriptional level by specific factors, which are released from carbohydrate sensor proteins upon their binding to specific plant cell wall polysaccharide substrates (46 -48). Because of the promiscuous binding between dockerin modules of the different hydrolases to the different repeats of cohesin modules on the scaffoldin, this malleable platform can be fine-tuned based on the specifically induced hydrolases, resulting in cellulosomes with optimized composition for degradation of the specific substrate(s).
Binding Affinity and Specificity Are Determined by Two Distinct Regions-Despite the fundamental differences between the cohesin-dockerin and the colicin-immunity protein interaction described above, their underlying interface architecture shows similar organization. In-depth mutational analysis of the colicin-immunity protein interaction has suggested a so-called "dual recognition mechanism" (6, 49) that involves two distinct patches at the interface: one conserved patch is responsible for binding affinity (i.e. two conserved central tyrosine residues), and the second patch governs binding specificity (i.e. the network of hydrogen bonds) (Note that this dual recognition mechanism should not be confused with the "dual binding mode" of the cohesin-dockerin interaction that allows binding of dockerin in two opposite orientations described in the Introduction (13,16)). The interface of the C. thermocellum cohesindockerin interaction is also composed of a conserved hydrophobic patch and a specific polar patch. We have shown here that the hydrophobic patch plays an important role in binding affinity because the mutation of a central position, L83A, significantly impairs binding (Fig. 2). Despite its hydrophobic and exposed nature, the high degree of cross-species sequence conservation of this residue (Fig. 1C) indicates a more general role of this patch for cohesin-dockerin binding affinity. In contrast, binding specificity is obtained by the polar patch that is conserved only within C. thermocellum and replaced by a hydrophobic patch in C. cellulolyticum. This patch in C. thermocellum contains at least one residue, Asn 37 , that upon mutation to alanine can overcome the species barrier and create a promiscuous cohesin that binds both its cognate (C. thermocellum) and non-cognate (C. cellulolyticum) dockerin (Fig. 6).

Polar Interactions across the Interface and Charges Play a Major Role in the C. thermocellum Cohesin-Dockerin Interaction but Are
Challenging to Model-The large number of charged residues at the C. thermocellum cohesin-dockerin interface suggests that this interaction is governed by electrostatics. The detrimental effect on binding of the C. thermocellum cohesin D39N mutation (first reported in Ref. 29 and reconfirmed in this study by an iELISA assay; Fig. 3D) suggests that this is due to removal of a charge. The importance of this charge at position 39 could not be foreseen by molecular modeling attempts (see Fig. 4) even after including a Coulomb electrostatic term in the Rosetta scoring function, the use of a generalized Born-based treatment of electrostatic solvation energy implemented into the Rosetta modeling suite (50), and a detailed calculation of the pK a of Asp 39 using the APBS modeling suite (51).
Molecular dynamics simulations of the wild-type cohesin and the D39N mutant suggested that removal of the charge at position Asp 39 leads to a significant destabilization of a nearby region, which is not observed in the wild type (52). However, these simulations originated from the crystal structure solved by Carvalho et al. (12) (Protein Data Bank code 1OHZ) in which the critical Asn 37 side chain has been incorrectly fitted. In the wild-type simulation, Asn 37 is indeed flipped at the very beginning of the simulation, whereas in the simulation of the D39N mutant, it is not. Consequently, the pronounced effect observed in the Xu et al. (52) study likely reflects a non-equilibrated starting conformation.
The ability to identify interface residue hot spots is the basis for more complex redesign of interfaces and binding specificity (53,54). Considerable effort has therefore been put into the development of computational approaches to model the effect of point mutations on protein stability and binding affinity (33,34,36,55,56). Despite good overall accuracy, these approaches are still challenged by complex polar effects at the interface as also demonstrated in the present study. Nevertheless, together with structural analysis, they enabled us to pinpoint the crucial interactions in the interface that contribute to binding in different ways.
Outlook-In addition to our ability to address fundamental questions about specificity and promiscuity of the cohesin-dockerin interaction, our study bears biotechnological implications for the fabrication of artificial designer cellulosome complexes (19 -21, 57-64). Specific cohesin-dockerin pairs could allow the generation of cellulosome complexes of predetermined content and spatial architecture. This study provides initial support for the feasibility of such an application. Future steps will involve the generation of additional specific cohesin-dockerin pairs.