Novel Clostridium thermocellum Type I Cohesin-Dockerin Complexes Reveal a Single Binding Mode*

Background: In general, dockerins present two homologous cohesin-binding interfaces, which confer increased flexibility into cellulosomes. Results: The structure of two novel Coh-Doc complexes reveals a dockerin single-binding mode. Conclusion: Single-binding mode dockerins bind, preferentially, to cell surface cohesins. Significance: The dual binding mode is a property of cellulosomal dockerins. Protein-protein interactions play a pivotal role in a large number of biological processes exemplified by the assembly of the cellulosome. Integration of cellulosomal components occurs through the binding of type I cohesin modules located in a non-catalytic molecular scaffold to type I dockerin modules located at the C terminus of cellulosomal enzymes. The majority of type I dockerins display internal symmetry reflected by the presence of two essentially identical cohesin-binding surfaces. Here we report the crystal structures of two novel Clostridium thermocellum type I cohesin-dockerin complexes (CohOlpC-Doc124A and CohOlpA-Doc918). The data revealed that the two dockerins, Doc918 and Doc124A, are unusual because they lack the structural symmetry required to support a dual binding mode. Thus, in both cases, cohesin recognition is dominated by residues located at positions 11, 12, and 19 of one of the dockerin binding surfaces. The alternative binding mode is not possible (Doc918) or highly limited (Doc124A) because residues that assume the critical interacting positions, when dockerins are reoriented by 180°, make steric clashes with the cohesin. In common with a third dockerin (Doc258) that also presents a single binding mode, Doc124A directs the appended cellulase, Cel124A, to the surface of C. thermocellum and not to cellulosomes because it binds preferentially to type I cohesins located at the cell envelope. Although there are a few exceptions, such as Doc918 described here, these data suggest that there is considerable selective pressure for the evolution of a dual binding mode in type I dockerins that direct enzymes into cellulosomes.


Protein-protein interactions play a pivotal role in a large
number of biological processes exemplified by the assembly of the cellulosome. Integration of cellulosomal components occurs through the binding of type I cohesin modules located in a non-catalytic molecular scaffold to type I dockerin modules located at the C terminus of cellulosomal enzymes. The majority of type I dockerins display internal symmetry reflected by the presence of two essentially identical cohesinbinding surfaces. Here we report the crystal structures of two novel Clostridium thermocellum type I cohesin-dockerin complexes (CohOlpC-Doc124A and CohOlpA-Doc918). The data revealed that the two dockerins, Doc918 and Doc124A, are unusual because they lack the structural symmetry required to support a dual binding mode. Thus, in both cases, cohesin recognition is dominated by residues located at positions 11, 12, and 19 of one of the dockerin binding surfaces. The alternative binding mode is not possible (Doc918) or highly limited (Doc124A) because residues that assume the critical interacting positions, when dockerins are reoriented by 180°, make steric clashes with the cohesin. In common with a third dockerin (Doc258) that also presents a single binding mode, Doc124A directs the appended cellulase, Cel124A, to the surface of C. thermocellum and not to cellulosomes because it binds preferentially to type I cohesins located at the cell envelope. Although there are a few exceptions, such as Doc918 described here, these data suggest that there is considerable selective pressure for the evolution of a dual binding mode in type I dockerins that direct enzymes into cellulosomes.
Biological nanomachines combining a range of complimentary enzyme activities are critical to cellular function. Cellulosomes are one of nature's most elaborate and highly efficient multienzyme complexes that deconstruct cellulose and hemicellulose, two of the most abundant polymers on Earth (1)(2)(3). Thus, cellulosomes play a major role in carbon recycling and provide an opportunity to explore the largely untapped energy provided by plant biomass, by the bioenergy and bioprocessing sectors. It is now well established that the complex physical and chemical structure of plant cell walls restrict their access to hydrolytic enzymes. Aerobic microorganisms that utilize plant biomass as a significant nutrient express extensive repertoires of degradative enzymes, primarily glycoside hydrolases but also lyases and esterases, which attack the structural polysaccharides of the plant cell wall. In contrast, microbial anaerobes, due to environmental selective pressures, have a lower protein producing capacity and organize enzymes into cellulosomes, which enhance enzyme synergy and substrate targeting (see Refs. 1 and 2 for review).
The cellulosome of the thermophilic bacterium Clostridium thermocellum has been extensively explored (4,5). It consists of a large non-catalytic multimodular protein, termed CipA, that contains nine tandemly repeated type I cohesins that recognize type I dockerins located in the cellulosomal enzymes (6,7). Type I cohesins of CipA display a very high level of sequence identity. It was thus suggested that there is little discrimination by the dockerins and their protein receptors presented by the cellulosome scaffold (8). Primary scaffoldins, such as CipA, may also contain a C-terminal divergent type II dockerin that specifically recognizes type II cohesins located on the bacterium's envelope, thereby providing a mechanism for the cell surface attachment of cellulosomes (9). Thus, different cohesin-dock-erin (Coh-Doc) 5 specificities (in C. thermocellum type I and type II) are responsible for the correct assembly of the multienzyme complex (type I) and its direct attachment to the organism (type II), respectively.
Structural studies on type I Coh-Doc complexes of C. thermocellum (10,11) and Clostridium cellulolyticum (12), a mesophilic bacterium that produces a cellulosome analogous to the former microorganism, provided insights into the molecular determinants of protein-protein recognition that mediate the assembly of these protein complexes. Dockerins fold into two ␣-helices and EF-hand calcium-binding loop motifs, each corresponding to one of the two duplicated segments (10,12). Thus, the structure of the N-terminal ␣-helix and EF-hand calcium-binding loop can be precisely superimposed over the equivalent structures at the C-terminal end, leading to an internal 2-fold symmetry in the dockerin molecule (11). The implications of this internal symmetry were realized when it was observed that type I dockerins present two cohesin binding surfaces because they can bind their cognate protein module either through the analogous N-or C-terminal ␣-helices (11). In C. thermocellum type I dockerins, residues that dominate the hydrogen bond network with cohesins are located at positions 11 and 12 of the calcium binding loop and are usually a Ser-Thr pair (10). When the dockerin is 180°reverse oriented, the equivalent residues (Ser 45 and Thr 46 ) in the C-terminal dockerin helix participate in cohesin recognition (11). The Ser-Thr dyad symmetry observed in C. thermocellum dockerins is replaced, in C. cellulolyticum, by hydrophobic residues, which accounts for the lack of affinity between protein partners from different species. The dockerin dual binding mode may reduce the steric constraints that are likely to be imposed by assembling a large number of different catalytic modules into a single cellulosome. In addition, the switching of the binding mode between two conformations may also introduce quaternary flexibility into multienzyme complexes, thus enhancing substrate targeting and the synergistic interactions between some enzymes, particularly exo-and endo-acting cellulases.
Currently, it is unclear whether the dual-binding mode displayed by C. thermocellum and C. cellulolyticum dockerins is universal to all cellulosomal enzymes. The genome sequence of C. thermocellum ATCC 27405 encodes 72 polypeptides containing type I dockerin sequences. Alignment of the 72 dockerin sequences at the two ligand binding sites revealed a strong conservation of the amino acids that mediate cohesin recognition (particularly Ser 11 , Thr 12 , and a Lys-Arg motif at positions 18 and 19). Recently, we described the identification of four dockerins, of proteins Cthe_0435 (Cel124A), Cthe_0918, Cthe_0258, and Cthe_0624 (Cel9D-Cel44A), which deviate from the canonical C. thermocellum motifs at least in one of the cohesin binding interfaces (13). Here we describe the structure of two complexes in which two different type I cohesins are bound to these unusual dockerin modules. The data indicate that a cohort of C. thermocellum type I dockerins display a single binding mode. The possible biological significance for the single binding mode displayed by these dockerins is discussed.

Cloning and Expression
DNA encoding type I dockerins of Cthe_0435 (Cel124A, residues 31-112) and Cthe_0918 (residues 1146 -1209) and type I cohesins of Cthe_0452 (OlpC, residues 108 -258) and Cthe_3080 (OlpA, residues 30 -177) were amplified by PCR from C. thermocellum genomic DNA using the thermostable DNA polymerase NZYDNAChange (NZYTech Ltd.) and primers described in Table 1. Genes encoding the type I dockerin modules of Cthe_0435 and Cthe_0918, here termed Doc124A and Doc918, respectively, were ligated into NdeI_BamHI-digested pET3a (Novagen). Genes encoding cohesin modules termed CohOlpC and CohOlpA, which derive from proteins Cthe_0452 and Cthe_3080, respectively, were ligated into NheI_XhoI-restricted pET21a (Novagen). Recombinant cohesins contained a C-terminal His 6 tag. To express the dockerin and the cohesin genes in the same plasmid, the recombinant pET3a derivative was digested with BglII and BamHI, to excise the dockerin gene under the control of the T7 promoter, which was subcloned into the BglII site of recombinant pET21a so that both genes were organized in tandem. Through this approach, it was possible to express both Doc124A and Coh452 and also Doc918 and Coh3080 in the same cells. Doc124A and Doc918 were also subcloned into pET32a vector (Merck) restricted with EcoRI and XhoI. Recombinant dockerins were expressed in fusion with thioredoxin to improve dockerin solubility and stability. OlpA and OlpC cohesins were also cloned into BglII-and EcoRI-digested pRSETa (Invitrogen). Mutant derivatives of both dockerins were synthesized (NZYTech Ltd.) with codon usage optimized for expression in Escherichia coli ( Table 2). The synthesized genes contained engineered EcoRI and XhoI recognition sequences at the 5Ј-and 3Ј-ends, respectively, which were used for subsequent subcloning into pET-32a (Merck), as described above.

Protein Purification
Cohesin-Dockerin Complexes-The Coh-Doc complexes CohOlpC-Doc124A and CohOlpA-Doc918 were expressed in E. coli Tuner cells, grown at 37°C to A 600 of 0.5. Recombinant protein expression was induced by adding isopropyl-␤-D-thiogalactopyranoside to a final concentration of 0.2 mM and incubation for 16 h at 19°C. The recombinant proteins were purified by immobilized metal ion affinity chromatography using Sepharose columns charged with nickel (HisTrap TM ). Fractions containing the purified Coh-Doc complexes were bufferexchanged, using PD-10 Sephadex G-25 M gel filtration col- Unbound Cohesins and Dockerins-Dockerins Doc124A, Doc918, and the respective mutant derivatives cloned in pET32a were expressed in E. coli Origami cells. CohOlpC and CohOlpA cloned in pRSETa vector were expressed in E. coli Tuner cells. Growth was performed at 37°C to midexponential phase (A 600 ϭ 0.5) in Luria broth. Recombinant protein expression was induced with 1 mM (Origami) or 0.2 mM (Tuner) isopropyl-␤-D-thiogalactopyranoside and incubation for 16 h at 19°C. The recombinant proteins were purified by immobilized metal ion affinity chromatography as described above and buffer-exchanged into 50 mM Na-Hepes buffer, pH 7.5, containing 2 mM CaCl 2 and then subjected to gel filtration using a HiLoad 16/60 Superdex 75 column (GE Healthcare) at a flow rate of 1 ml/min.

Isothermal Titration Calorimetry (ITC)
ITC experiments were carried out essentially as described previously (11,12), except that the titrations were at 55°C, and proteins were in 50 mM Na-HEPES buffer, pH 7.5, containing 2 mM CaCl 2 . During titration, the dockerin (40 M) was stirred at 300 rpm in the reaction cell, which was injected with 28 successive 10-l aliquots of ligand comprising cohesin (180 M) at 200-s intervals. Integrated heat effects, after correction for heats of dilution, were analyzed by non-linear regression using a single site-binding model (Microcal ORIGIN, version 5.0, Microcal Software). The fitted data yielded the association constant (K a ) and the enthalpy of binding (⌬H). Other thermodynamic parameters were calculated by using the standard thermodynamic equation, ⌬RTlnK a ϭ ⌬G ϭ ⌬H Ϫ T⌬S.

Crystallization and Data Collection
Protein crystals were obtained using the hanging drop, vapor diffusion method. CohOlpC-Doc124A complex crystals grew in 2 M ammonium sulfate, pH 4.6 (condition 32 of Crystal Screen HR2-110 from Hampton Research) in drops with 7 g/liter protein and were harvested after 5-7 weeks at 19°C. CohOlpA-Doc918 complex crystals grew in 0.2 M lithium sulfate, 10% (w/v) PEG 8000ϩ, 10% (w/v) PEG 1000, pH 7.5 (condition 14 of Clear Strategy Screen I MD1-14 from Molecular Dimensions) in drops with 16 g/liter protein and were harvested after 3-5 weeks at 19°C. Crystals were cryocooled with paratone in liquid nitrogen prior to data collection at beamline ID14 -2 at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) at 100 K using an ADSC QUANTUM 4R CCD detector and at a wavelength of 0.9330 Å.

Structure Determination and Refinement
Structure determination of CohOlpC-Doc124A was based on two data sets that were processed in MOSFLM (14) merged and combined with SORTMTZ from the CCP4 suite (16) and scaled in SCALA (15). Phasing was performed by molecular replacement with the program BALBES (18) using a search model based on the Protein Data Bank structures 2ccl, 1aoh, 2vn6, 1nv8, and 1ixh, mainly related to cohesin and dockerin modules from C. thermocellum and C. cellulolyticum (11,12). Density modification, together with non-crystallographic symmetry averaging, was done with the DM program from the

Novel Type I Coh-Doc Complexes from C. thermocellum
CCP4 suite (16). ARP/wARP (19) was used to automatically build the protein model. Model completion, editing, and initial validation were carried out in COOT (21). Initial restrained refinement of the molecular model was done using REFMAC version 5.5 (22), and water molecules were added/validated according to the following criteria: a compatible water-shaped , whose center was within acceptable hydrogen bond distance to the closest protein atoms or other waters (ϳ2.4 -3.2 Å), and B-factors similar to neighboring atoms but less than 80 Å 2 . The final cycles of refinement were done with the program PHENIX.REFINE from the PHENIX suite (23). The two molecules in the asymmetric unit are arranged as a dimer of heterodimers, the later composed of chains A/B or C/D. All atoms in the protein could be properly assigned and refined, apart from a few initial (first 6 and 7 residues from chains A and C, respectively, and the first 6 residues from chains B and D) and final residues (last 6 and 7 residues from chains A and C, respectively, and the last 6 and 8 residues from chains B and D, respectively) in the polypeptide chains. The final model also includes 461 water molecules and four calcium ions. R-work and R-free converged to 18.1 and 21.5%, respectively. Model assessment and validation were carried out by PHENIX.POLYGON (24) and MOLPROBITY (25) from the PHENIX suite and PROCHECK (26). According to these programs, the final model contains 99.5% of the residues in most favored and allowed regions of the Ramachandran plot and 0.5% of the residues in generously allowed regions of the plot. Structure determination of CohOlpA-Doc918 was similarly done by molecular replacement with BALBES using structures with Protein Data Bank entries 2ccl and 2vn6 as models (11,12). Density modification with non-crystallographic symmetry was done with the PARROT (27) program from the CCP4 suite. ARP/wARP was used to automatically build the protein model. Model completion, editing, and initial validation were also carried out in COOT. The dockerin start model had to be manually rebuilt due to a mistracing error by ARP/wARP, originating from the presence of dockerin's two duplicated segments that share a striking sequence similarity and strong structural conservation. Refinement procedures were done as described for CohOlpC-Doc124A. R-work and R-free converged to 17.5 and 20.6%, respectively. Two chains were found in the asymmetric unit, arranged as a dimer (A/B). Protein residues could be properly assigned and refined, apart from the first 5 and last 10 residues from chain A and the first 2 residues from chain B. The final model includes 224 water molecules and two calcium ions. Model assessment and validation using the above-mentioned tools produced a final model with 100% of the residues in the most favored and allowed regions of the Ramachandran plot. Data collection and refinement details data for the two complete structures are summarized in Table 3.

Expression and Crystallization of Novel Coh-Doc Complexes
In a previous study (13), C. thermocellum type I dockerins were shown to bind to the nine type I cohesins of CipA and the single cohesin modules of the cell surface protein OlpA or OlpC (Fig. 1). The majority of C. thermocellum dockerins (68 of 72), exemplified by the well characterized dockerin of Xyn10B (10, Novel Type I Coh-Doc Complexes from C. thermocellum DECEMBER 28, 2012 • VOLUME 287 • NUMBER 53 11), display a distinctive internal symmetry that is compatible with a dual binding mode. These dockerins display preferential recognition for OlpA and CipA cohesins. In contrast, two C. thermocellum dockerins, from the protein of unknown function, Cthe_0258, and the recently described cellulase, Cel124A (28), display a 2-and 10-fold preferential binding, respectively, to the cell envelope cohesin of OlpC (13). A third dockerin, of the bifunctional cellulase Cel9D-Cel44A, displays two cohesinbinding interfaces with different specificities; the dockerin can interact with C. cellulolyticum cohesins through the N-terminal interface and with C. thermocellum counterparts through the C-terminal binding site (13). A fourth dockerin, from the protein of unknown function Cthe_0918, binds equally well to CipA and OlpA and displays a lower affinity for OlpC. The primary sequences of these four dockerins lack the distinctive symmetry at the binding interfaces, which may explain, at least for dockerins of Cthe_0258, Cel124A and Cel9D-Cel44A, the observed differences in ligand specificity (13). The dockerin dual binding mode does not favor crystallization of protein complexes, and the usual approach used to study the Coh-Doc interaction involves the inactivation of one of the dockerin cohesin-binding interfaces through site-directed mutagenesis (11,12). Because the four unusual dockerins described above seem to present a single binding interface, wild-type proteins were used for these structural studies. Here, we have used established strategies for the production and purification of Coh-Doc complexes, which involve the co-expression of both proteins in E. coli cells (29).

Structure of Type I Coh-Doc Complexes
The structures of OlpA type I cohesin bound to the dockerin of the protein Cthe_0918 (CohOlpA-Doc918) and of the OlpC type I cohesin in complex with the dockerin of Cel124A (CohOlpC-Doc124A) were solved to 1.95 and 1.75 Å resolution, respectively (Fig. 2). In C. thermocellum, OlpA and OlpC cohesins are the only two type I cohesins that do not belong to CipA and show significant deviations in the putative residues that participate in dockerin recognition, when compared with the nine highly homologous cohesins of CipA (13) (for details, see supplemental Fig. 1S).
Both type I cohesin structures in complex with their respective protein partners reveal a striking similarity (supplemental Fig. 2SA). In comparison with the second cohesin of CipA (CohCipA2), a structure superposition with CohOlpA has an r.m.s. deviation of 1.04 Å (between 136 C ␣ pairs with a sequence identity of 35.3%) and 1.18 Å with CohOlpC (133 C ␣ pairs, 38.3% sequence identity). The two novel cohesins superpose with each other with an r.m.s. deviation of 1.14 Å (139 C ␣ pairs, 34.5% sequence identity). Noteworthy structural divergences occur between ␤-strands 4 and 5 (which include a small ␣-helix), where both CohOlpA and CohOlpC have a shorter loop than CohCipA2, and on the loop between ␤-strands 7 and 8 that, compared with CohCipA2, is slightly longer in CohOlpA and considerably larger in CohOlpC, increasing the main longitudinal axis length of these two proteins by around 2 and 10 Å, respectively (supplemental Fig. 2SA). The ␤-sheet B interface area, evaluated on the basis of its solvent-accessible area when in complex with its cognate dockerin (PDBePISA) (20), was 686 Å 3 for CohCipA2, 803 Å 3 for CohOlpA, and 729 Å 3 for CohOlpC.
Structure of Type I Dockerins-The structures of dockerins of Cthe_0918 and Cel124A, here termed Doc918 and Doc124A, respectively, are organized in two ␣-helices, arranged in an antiparallel orientation (N-terminal or helix-1 and C-terminal or helix-3) connected through an extended loop displaying a small helix (helix-2) (Fig. 3). In Doc918, helix-1 is composed of residues 15-27, helix-2 extends between residues 35 and 39, and helix-3 extends from residue 48 to 60. In Doc124A, the respective residues are as follows: helix-1 (residues 17-29), helix-2 (residues 39 -45), and helix-3 (residues 53-65). Helix-2 connects the other two helices, both of which provide the two putative cohesin binding interfaces. In fact, this region, limited by the distal end of helix-1 and the C terminus of helix-2, contains a large amount of the structural variability found among the core C␣ trace of these dockerins. In Doc918, the region connecting the two helices is less structured than in DocXyn10B, presenting a single turn on its ␣-helix, similar to a type I C. cellulolyticum dockerin (12). The internal sequence duplication and nearly perfect 2-fold symmetry was quantified by an internal superposition between helix-1 and -3 within each structure. Doc918 shows an r.m.s. deviation of 0.57 Å for 23 C ␣ pairs, and in Doc124A, both segments overlap almost as well, with an r.m.s. deviation of 0.66 Å for 26 C ␣ pairs. Lack of conservation in the key contacting residues when the two putative binding surfaces are compared should prevent a dual binding mode, which is explored below. Both dockerins contain two Ca 2ϩ ions coordinated by several residues in the canonical EFhand calcium-binding loop. The coordination of the two cal- Novel Type I Coh-Doc Complexes from C. thermocellum DECEMBER 28, 2012 • VOLUME 287 • NUMBER 53 JOURNAL OF BIOLOGICAL CHEMISTRY 44399 cium ions is similar to the metal ions observed in the type I dockerins of C. thermocellum and C. cellulolyticum in complex with their cognate protein partners (supplemental Fig. 3SA).

Novel Type I Coh-Doc Complex Interfaces
In contrast with what was previously observed for other type I complexes, the dockerins described here in complex with their protein partners seem to present a single binding mode. Thus, in the CohOlpA-Doc918 complex, binding is dominated by the Doc918 C-terminal helix. In contrast, in the CohOlpC-Doc124A complex, binding is orchestrated by the dockerin N-terminal helix (Fig. 2). In these two novel Coh-Doc structures, the complex interface has a significant hydrophobic nature. Using the solvation free energy gain at complexation, calculated by PDBePISA (⌬ i G in kcal/mol (20)), the CohOlpA-Doc918 interaction is more hydrophobic (Ϫ10.6 kcal/mol) than that of CohOlpC-Doc124A (Ϫ7.7 kcal/mol), which in turn exceeds the CohCipA-DocXyn10B value of Ϫ6.4 kcal/mol. However, the negative values upon binding are less significant than those of the highly hydrophobic C. cellulolyticum type I complex (Protein Data Bank code 2vn6) with Ϫ14.9 kcal/mol. These differences reflect the numerous hydrophobic residues, involved in the Coh-Doc complex interface, enumerated in detail in supplemental Table 1S and highlighted in supplemental Figs. 1S and 3S. Thus, the numbers of cohesin and dockerin hydrophobic residues implicated in the interface of the CohOlpA-Doc918 are greater than in the CohCipA2-DocXyn10B complex. Although the hydrophobic contact network of CohOlpC-Doc124A is also extensive, the hydrophobic residues that contribute to the heterodimer interface are contributed primarily by Doc124A.
The major hydrophobic contact residues located at the surface of cohesins CipA2, OlpA, and OlpC include a completely conserved leucine (Leu 83 , Leu 83 , and Leu 92 , respectively), which is assisted by upstream hydrophobic residues Val 81 , Ala 81 , and Val 90 and downstream by Ala 85 , Leu 85 , and a divergent Asp 94 in OlpC, respectively. Other important contributors correspond to Leu 129 , Met 132 , and Leu 146 , respectively. With respect to the dockerins Xyn10B-␣3/Xyn10B-␣1, 918, and 124A, the major hydrophobic contact residues are Leu 22 /Leu 56 , Leu 27 , and Leu 65 , respectively, at position 22 of the less interacting binding interface. In addition, in position 15 of the dominating interface, residues Leu 49 /Thr 15 , Leu 53 , and Val 22 make a significant contribution to cohesin recognition. The above mentioned conserved leucine located at the surface of the three cohesins is part of an important hydrophobic pocket formed in CohCipA2 by Ala 72 , Tyr 74 , Val 81 , and Leu 83 , which is occupied by Leu 22 or Leu 56 from DocXyn10B in the two possible binding modes, respectively. Using the same relative structural positioning order, for CohOlpA, we find Asn 72 , Ala 81 , and Leu 83 , which accommodate Leu 27 from Doc918. As for CohOlpC, residues Asn 81 , Val 90 , and Leu 92 form a hydrophobic pocket that is occupied by the equivalent Doc124A residue, Leu 65 , found in the opposite C-terminal interface.
The heterodimer interfaces are assisted by a network of direct and bridged hydrogen bonds and salt bridge interactions (described in detail in Table 4). Compared with DocXyn10B(␣3) in a similar C-terminal binding conformation (10), Doc918 reveals a more imbalanced distribution of polar bonds, favoring helix-3 residues. Although the Ser/Thr dyads of both complexes share an equivalent contribution, the main difference occurs at the Lys 56 /Lys 57 pair of Doc918 that contribute with one salt bridge and two direct H-bonds, whereas in DocXyn10B(␣3), the equivalent Ser 52 makes no polar bonds, and Arg 53 establishes a single salt bridge. Again, in comparison with the N-terminal bound DocXyn10B(␣1), Doc124A reveals some striking differences with respect to the relevant Ser-Thr pair, which is replaced by a divergent Ile 18 -Ser 19 motif. In Doc124A the N-terminal binding face interacts with CohOlpC, through significant hydrophobic contacts. The only direct polar interactions mediated by helix-1 occur via positions 18 and 19 (Lys 25 -Arg 26 ), through six direct bonds (two salt bridges from Lys 25 and four H-bonds from Arg 26 ) and a couple of water bridged H-bonds involving Ile 18 . In contrast, the N-terminal bound configuration of DocXyn10B reveals a hydrogen bond network around, and dominated by, the conserved Ser-Thr pair and also some involvement of residues 18 and 19. Also in contrast to DocXyn10B(␣1), Doc124A presents in the opposite interface (␣3) a stronger polar contribution participated in by four residues, Lys 61 (one salt bridge), Leu 64 (one H-bond), Leu 65 (one H-bond), and His 66 (one salt bridge), whereas in DocXyn10B, Leu 56 and mainly Arg 57 make polar contacts with the CipA cohesin (Fig. 3, A and C; detailed contacts in supplemental Figs. 1S and 3S). Extending the comparison of Doc124A to C. cellulolyticum type I complex (2vn5/2vn6) in an analogous

Network of polar interactions in novel type I Coh-Doc complex interfaces
binding conformation, the major difference consists of a much subdued polar interaction network found in the latter, especially at the ␣3 interface, where only positions 22 and 23 reveal direct contacts (12).
The cohesin-interacting residues can be grouped into three regions corresponding to ␤-strands ␤3, ␤-strands ␤5/␤6, and the loop between ␤-strands ␤8 and ␤9 ( Fig. 3B; detailed contacts in supplemental Fig. 1S). Around the ␤3 region, the important interactions are quite similar among CohCipA2/ CohOlpA, because equivalent residues Asn 37 /Ser 39 and Asp 39 / Asp 41 , respectively, establish relevant polar contacts with the dockerin Ser/Thr pair. Conversely, the equivalent CohOlpC residue Ser 48 does not display any polar contacts, and Asn 50 , equivalent to CohCipA2 Asp 39 , establishes a single H-bond with the dockerin. In the ␤5/␤6 cohesin region, notable differences between CohCipA2, CohOlpA, and CohOlpC occur, respectively, at Arg 77 , Asp 77 , and Asp 86 residues; Arg 77 makes an H-bond with its target dockerin, whereas the equivalent acidic residues of the other two cohesins are not implicated on the interface. In the ␤8-loop-␤9 region, the corresponding residues Asn 127 , Asn 130 , and Phe 144 in CohCipA2, OlpA, and OlpC, respectively, reveal some differences in their capacity to recognize the dockerin protein partner. In a helix-3-dominated binding, CohCipA2-Asn 127 does not exhibit any contacts with its dockerin, whereas CohOlpA-Asn 130 makes two bridged H-bonds. However, in a helix-1-dominated binding, CohCipA2 uses its Asn 127 to make two H-bonds with Doc-Arg 19 , whereas in CohOlpC, the backbone of Phe 144 establishes two H-bonds with Doc-Arg 26 . In addition, in the Glu 131 /Glu 134 /Pro 148 position of the cohesins, both acidic residues from CohCipA2 and CohOlpA form an H-bond with the critical threonine found at position 12 of the dockerin, whereas CohOlpC-Pro 148 does not contribute to dockerin recognition.
Further analysis of the differences between the canonical type I cohesin and this work on cell-bound cellulosomal cohesins was based on the predicted negative hydrogen bondaccepting regions in an electrostatic surface potential evaluation using the Poisson-Boltzmann electrostatics calculation on the PDB2PQR server (30) and visualization of the results in UCSF Chimera (31) (Fig. 4). As reported previously (32), cohesins are strikingly negatively charged in the binding interface plateau, whereas dockerins present a suitable complementary positive-to-neutral surface. Compared with CohOlpC and CohCipA2, CohOlpA shows an elongated polar region that extends beyond the binding interface. As described for the type II cohesin of SdbA (32), the opposite cohesin surfaces in CohOlpC and CohOlpA are more positively/neutrally charged, which was suggested to be important to promote a tighter interaction of cell surface cohesins to the negatively charged peptidoglycan layer.
Analysis of the type I Coh-Doc interfaces provides significant insights into the previously described tight binding of Doc124A to CohOlpC, in comparison with the lower affinity displayed by this dockerin toward the cohesins of CipA and OlpA (13). The hydrophobic nature of Ile 18 at the critical position 11 of the Doc124A interface, establishes a strong network of apolar contacts with CohOlpC, namely with Asn 50 , Asn 140 , and Cys 142 . These CohOlpC pivotal residues are replaced by an aspartate (position 50) and by small residues, namely glycine or alanine, at the other two positions in CohCipA and CohOlpA cohesins.
The aspartate residue, equivalent to CohOlpC-Asn 50 , found in both OlpA and CipA cohesins is also highly relevant for the recognition of typical type I dockerins because, together with Asn 37 , it establishes conserved hydrogen bonds with the canonical serine residues usually found at position 11. In CohOlpC, the latter residues are replaced with Ser 48 and Asn 50 , respectively, whose side chains were found more than 4 Å apart and thus presumably unavailable for H-bond formation. In addition, dockerin position 12 of the binding interface makes some relevant polar contacts with the mentioned residue of CohCipA2-Asn 37 and also Glu 131 . Again, in CohOlpC, the equivalent Ser 48 side chain orientation is unsuitable for those contacts, and the Pro 148 , which substitutes Glu 131 , is manifestly non-reactive.

Probing the Importance of Contact Residues in Dockerins
To identify the dockerin residues that are involved in cohesin recognition, a mutagenesis study informed by the previously described type I complex structures was implemented. Previous data suggest that the implications of single changes in dockerin activity may be relatively modest, so the strategy used here involved the change of particular groups of residues that are believed to play a cooperative role in cohesin recognition (10 -13).
CohOlpA-Doc918 Complex-Site-directed mutagenesis and ITC data of the CohOlpA-Doc918 complex ( Fig. 5 and Table 5), show that the replication of the relevant residue environment from dockerin helix-1 into the C-terminal helix-3, which dominates ligand recognition (mutant Doc918_m1: S49D, T50E, and K57N), precluded any binding, which reinforces a vital role for the Ser-Thr motif, similar to the canonical type I Coh-Doc interaction (10). The drastic decrease in affinity obtained with mutant Doc918_m3 (S49Q and T50Q) also supports a major role for the C-terminal dockerin Ser-Thr motif. The K57N  mutation also emphasizes the relative importance of a basic residue, such as lysine or arginine (equivalent to Arg 53 in DocXyn10B), in this position for efficient binding. The Doc918_m2 mutant design provided additional insights into the pivotal residues mediating cohesin recognition. Essentially, using the inactive dockerin, Doc918_m1, an attempt was made to force the alternate helix-1 binding mode. Thus, the non-functional N-terminal helix of Doc918_m1 was engineered to restore an N-terminal cohesin-binding interface by introducing the three pivotal residues (mutant Doc918_m2: D16S, E17T, and N24K) identified in helix-3 in the corresponding positions in helix-1. ITC data showed that, although with a 10-fold reduction in affinity, this strategy indeed allowed binding through helix-1. In CohOlpC-Doc124A and CohCipA2-DocXyn10B (i.e. the S45A/T46A mutant of DocXyn10B) (11), where helix-1 dominates the binding interface, there is a bulky positively charged residue in helix-3 (His 66 and Arg 57 , respectively), which provides polar and hydrophobic interactions to the interface but which was replaced by a Gly 61 in Doc918 when binding was engineered at the N-terminal face (Fig. 3). This divergent substitution could thus contribute to the reduced affinity displayed by the Doc918_m2 mutant. Overall, the data presented here confirm that Doc918 presents a single proteinbinding interface that is dominated by the C-terminal helix and where Ser 49 , Thr 50 , and Lys 57 dominate cohesin recognition.
CohOlpC-Doc124A Complex-As described above, the Doc124A Lys 25 -Arg 26 pair dominates the polar binding network with OlpC cohesin, whereas Lys 61 makes an important salt bridge with Asp 79 present at the surface of the cohesin. Thus, Doc124A mutants m1 and m2 were used to explore the importance of helix-1 Lys 25 -Arg 26 and helix-3 Lys 61 -Arg 62 pairs, by mutating them separately (m1) or simultaneously (m2) to Ala ( Fig. 6 and Table 5). As expected, based on these multiple polar contacts, the lesion in helix-1 (m1) caused an ϳ400-fold decrease in affinity. In addition, the additive effect of mutating the two basic pairs at helix-1 and helix-3 simultaneously (m2) led to complete loss in cohesin recognition, confirming the importance of Lys 61 in heterodimer formation. Thus, the basic pair at helix-1 plays a key role in cohesin recognition, and the massive reduction in affinity suggests a single binding mode for Doc124A. However, because the helix-3 Lys 61 -Arg 62 pair is in a position symmetry-related to that of Lys 25 -Arg 26 in helix-1, it is also possible that, following a 180°rotation of the dockerin, these latter residues could participate in a lower affinity cohesin recognition mediated by helix-3. Under these circumstances, the lower affinity of m1 would result from substitution of the critical Ile by an Asp at position 11 and by the loss of a putative Lys 25 -mediated salt bridge at the other helix.
Data presented above suggest that Doc124A could eventually present two cohesin binding interfaces expressing different affinities. To explore this possibility, Doc124A Ile 18 , Val 22 , and Leu 23 , which are part of the hydrophobic platform of the helix-1 binding interface, were mutated to replicate their symmetryrelated counterparts in helix-3 (m3) (Fig. 6 and Table 5). The data revealed that these mutations lead to a reduction in the capacity of Doc124A to bind its cohesin partner. The Doc124A_m4 mutant introduces into the m3 background, in which helix-1 binding is reduced, the mutations D54I, N58V, and Y59L, with the intention of promoting a reversal in binding through the C-terminal helix ( Fig. 6 and Table 5). ITC results show an 8-fold increase in affinity over the m3 mutant, similar to the wild type dockerin, suggesting that although a dual binding mode is not feasible in the native form of Doc124A, in the m4 mutant, binding is probably dominated by the C-terminal interface. Thus, overall, the data suggest that Doc124A presents a single binding mode driven by helix-1.
The importance of the hydrophobic network established between Doc124A and OlpC was further explored in the mutant m5, which investigated the role of a second residue pair, Leu 64 -Leu 65 , in the interactions established with the cohesin (Fig. 6 and Table 5). As described above, the Doc124A dockerin presents a symmetry-related pair at helix-1, Ile 28 -Leu 29 , which could be involved in a similar interaction if binding was mediated by the helix-3 lower affinity interface. The importance of this pair was explored in m6. The knock-out of the Leu 64 -Leu 65 helix-3 pair (m5) induced a 10-fold decrease in affinity, confirming the relevance of these residues in binding the cohesin when helix-1 is the dominant binding face. Indeed, it is reasonable to assume that the loss of Leu 64 and Leu 65 in m5 reorientates the major binding face to helix-3. Consistent with this view is the further reduction in affinity by the concurrent mutation of the proximal helix-1 residue pair (m6).

CONCLUSIONS
The structure of two type I Coh-Doc complexes presented here revealed that unlike the large majority of C. thermocellum dockerins, the dockerins of cellulase Cel124A and of Cthe_0918 protein, presently of unknown function, display a single cohesin-binding surface. The structures of the two dockerins were solved in complex with the two unique cell surface type I cohesins of C. thermocellum, OlpA and OlpC, which direct plant cell wall hydrolytic enzymes directly to the cell surface. A recent study (33) revealed that cellulosomes act in synergy with enzymes located at the bacterium cell envelope, which include the abundant Cel124A endocellulase that targets cellulose crystalline-amorphous junctions. The fact that high quality crystals for both complexes were obtained using wild type dockerins was an initial good indication that these dockerins present essentially a single interacting surface. The structures of the two complexes revealed that the critical positions 11 and 12 of the dockerin non-interacting interface are occupied predominantly with acidic residues (Glu and Asp). Acid residues are not suitable for interacting with the highly negatively charged cohesin platform. Site-directed mutagenesis data demonstrate the importance of the Ser/Ile-Thr motif at positions 11 and 12 and the Lys/Lys-Arg pair at positions 18 and 19 in cohesin recognition. Inspection of the primary sequences of dockerins of Cthe_0258, which recognizes OlpC with higher affinity, and cellulase Cel9D-Cel44A, which binds both C. thermocellum and C. cellulolyticum cohesins, also revealed unsuitable substitutions at one of the dockerin binding faces, which should result in only one binding face capable of recognizing C. thermocellum cohesins. It is presently unclear why a subset of four dockerins, the two described here and those from Cthe_0258 and Cel9D-Cel44A, have not evolved the dual binding mode characteristic of the other 68 C. thermocellum cellulosomal enzymes and extensively described for Xyn10B dockerin. Whereas the Cel124A dockerin directs the appended enzyme to the cell surface, because it binds predominantly to the OlpC cohesin, the dockerin of Cel9D-Cel44A is believed to present two cohesin binding interfaces with different cohesin specificities (the N-terminal face binds C. cellulolyticum-like cohesins, and the C-terminal interface binds their C. thermocellum counterparts). Thus, together, these data suggest that a dual binding mode is of primary importance for enzymes binding CipA, the multimodular cohesin scaffolding responsible for cellulosome assembly in C. thermocellum. An exception to this general rule is the dockerin of Cthe_0918, which recognizes CipA cohesins with higher affinity. The elucidation of the functional role of the protein domain appended to Cthe_0918 dockerin would help to clarify this issue. Nevertheless, the presence of two cohesin binding interfaces in dockerins integrated in multienzyme complexes may contribute to the capacity of the cellulosome to adjust its catalytic machinery to a highly insoluble and recalcitrant substrate.