Structure–function analyses generate novel specificities to assemble the components of multienzyme bacterial cellulosome complexes

The cellulosome is a remarkably intricate multienzyme nanomachine produced by anaerobic bacteria to degrade plant cell wall polysaccharides. Cellulosome assembly is mediated through binding of enzyme-borne dockerin modules to cohesin modules of the primary scaffoldin subunit. The anaerobic bacterium Acetivibrio cellulolyticus produces a highly intricate cellulosome comprising an adaptor scaffoldin, ScaB, whose cohesins interact with the dockerin of the primary scaffoldin (ScaA) that integrates the cellulosomal enzymes. The ScaB dockerin selectively binds to cohesin modules in ScaC that anchors the cellulosome onto the cell surface. Correct cellulosome assembly requires distinct specificities displayed by structurally related type-I cohesin–dockerin pairs that mediate ScaC–ScaB and ScaA–enzyme assemblies. To explore the mechanism by which these two critical protein interactions display their required specificities, we determined the crystal structure of the dockerin of a cellulosomal enzyme in complex with a ScaA cohesin. The data revealed that the enzyme-borne dockerin binds to the ScaA cohesin in two orientations, indicating two identical cohesin-binding sites. Combined mutagenesis experiments served to identify amino acid residues that modulate type-I cohesin–dockerin specificity in A. cellulolyticus. Rational design was used to test the hypothesis that the ligand-binding surfaces of ScaA- and ScaB-associated dockerins mediate cohesin recognition, independent of the structural scaffold. Novel specificities could thus be engineered into one, but not both, of the ligand-binding sites of ScaB, whereas attempts at manipulating the specificity of the enzyme-associated dockerin were unsuccessful. These data indicate that dockerin specificity requires critical interplay between the ligand-binding surface and the structural scaffold of these modules.

The cellulosome is a remarkably intricate multienzyme nanomachine produced by anaerobic bacteria to degrade plant cell wall polysaccharides. Cellulosome assembly is mediated through binding of enzyme-borne dockerin modules to cohesin modules of the primary scaffoldin subunit. The anaerobic bacterium Acetivibrio cellulolyticus produces a highly intricate cellulosome comprising an adaptor scaffoldin, ScaB, whose cohesins interact with the dockerin of the primary scaffoldin (ScaA) that integrates the cellulosomal enzymes. The ScaB dockerin selectively binds to cohesin modules in ScaC that anchors the cellulosome onto the cell surface. Correct cellulosome assembly requires distinct specificities displayed by structurally related type-I cohesin-dockerin pairs that mediate ScaC-ScaB and ScaAenzyme assemblies. To explore the mechanism by which these two critical protein interactions display their required specificities, we determined the crystal structure of the dockerin of a cellulosomal enzyme in complex with a ScaA cohesin. The data revealed that the enzyme-borne dockerin binds to the ScaA cohesin in two orientations, indicating two identical cohesinbinding sites. Combined mutagenesis experiments served to identify amino acid residues that modulate type-I cohesindockerin specificity in A. cellulolyticus. Rational design was used to test the hypothesis that the ligand-binding surfaces of ScaA-and ScaB-associated dockerins mediate cohesin recognition, independent of the structural scaffold. Novel specificities could thus be engineered into one, but not both, of the ligandbinding sites of ScaB, whereas attempts at manipulating the specificity of the enzyme-associated dockerin were unsuccessful. These data indicate that dockerin specificity requires critical interplay between the ligand-binding surface and the structural scaffold of these modules.
Plant cell wall polysaccharides, primarily cellulose and hemicelluloses, are a major reservoir of carbon and energy (1), and the recycling of these complex carbohydrates by microorganisms is integral to the carbon cycle. Furthermore, as the demand for renewable sources of energy and novel molecules for the chemical industry increases, so does the environmental and industrial significance of these abundant structural macromolecules. The deconstruction of the plant cell wall requires, however, an extensive array of hydrolytic enzymes to attack this heterogeneous, predominantly insoluble and highly recalcitrant substrate (2). Specialized anaerobic bacteria have adopted an elaborate strategy to degrade structural plant carbohydrates, through the organization of enzymes into multiprotein complexes, termed cellulosomes (3). Typically, the molecular integration of microbial biocatalysts into these extremely elaborate nanomachines results from the binding of enzyme-borne type-I dockerin (Doc) 4 modules to reiterated type-I cohesin (Coh) modules located in a large non-catalytic protein, termed scaffoldin, thus promoting enzyme synergism and protein stability. In addition, recruitment of cellulosomes to the bacterial cell surface via divergent type-II Coh-Doc interactions allows the immediate uptake of released sugars, which are used by microbes as an energy source (1,4). The protein-protein interaction established between the Coh and Doc modules exhibits one of the strongest binding affinities found in nature, close to that of a covalent bond, and plays a crucial role in both cellulosome assembly and cell-surface attachment (3,5,6). In addition, the organization and structural architecture of cellulosomes are defined by the specificity of the different Coh and Doc modules (7).  The mesophilic anaerobic bacterium Acetivibrio cellulolyticus produces a highly efficient cellulosome capable of hydrolyzing a range of cellulosic materials (8,9). Within the genome of A. cellulolyticus there is a cluster of four tandem scaffoldin genes (scaA, scaB, scaC, and scaD) (10,11). The primary scaffoldin ScaA (where the enzymes of the cellulosome are recruited) shares the main traits found in the primary ScaA scaffoldin of the canonical cellulosome of Clostridium thermocellum. Thus, A. cellulolyticus ScaA contains an internal family-3 carbohydrate-binding module (CBM3), flanked by seven type-I Coh modules and a divergent C-terminal type-II Doc (12). Downstream of ScaA are genes encoding for an adaptor and an anchoring scaffoldin, ScaB and ScaC, respectively. ScaB contains four type-II Coh modules, which interact with the type-II Doc of ScaA, and a divergent C-terminal type-I Doc that binds to the type-I Coh modules of the ScaC scaffoldin. ScaB essentially plays the role of an adaptor protein that mediates the interaction between ScaA (and its incorporated enzymes) and ScaC, the bacterial cell-surface anchoring protein. ScaB is the first example of a cellulosomal adaptor protein (10) (Fig. 1) (10). The genome sequence of A. cellulolyticus CD2 revealed numerous additional cellulosomal components, gene-regulatory elements, and cell-anchoring modules (identified by the presence of signature Doc or Coh sequences), suggestive of a much more elaborate and sophisticated cellulosome system than originally observed (13). In total, 143 Doc-containing A. cellulolyticus proteins were identified.
There is no evidence of cross-specificity between type-I and type-II Coh-Doc partners (14 -16), explaining why ScaB, through its Coh and Doc modules, interacts with two distinct proteins (ScaA and ScaC). Structural studies on type-I complexes, primarily from C. thermocellum (5,17,18) and Clostridium cellulolyticum (19), revealed that the primary sequence duplication displayed by type-I Docs supports a dual-binding mode, based on the interaction of two 180°symmetry-related binding interfaces. It was recently shown that the sequence and structural symmetry within the ScaB A. cellulolyticus type-I Doc allows it to bind ScaC Cohs in two different orientations (20). This sequence symmetry is also evident in the enzymeborne Docs of A. cellulolyticus that interacts with ScaA, therefore suggesting a putative dual-binding mode capability for these interactions.
Although very closely related, the enzyme-borne and ScaB type-I Docs do not display cross-specificity. Thus, Coh-contacting residues at positions 11 and 12 (numbering established considering the first Gly of each calcium-binding loop as residue 1), which are thought to be the major specificity determinants of all type-I Docs (3), are different in the ScaB and enzyme type-I Docs. Differences at these key residues may explain why there is a lack of cross-specificity between the type-I Doc interactions that modulate the binding of ScaB onto ScaC or the cellulosomal enzymes onto ScaA (10,21). It is possible, however, that other elements of the two type-I Doc species confer their observed distinct specificities. Furthermore, although sequence symmetry of the duplicated Doc segments of A. cellulolyticus cellulosomal enzymes supports a potential dualbinding mode, this hypothesis remains to be tested.
Here, we have explored the mechanism by which the cellulosome assembles and is anchored onto the bacterial surface. We report the structure of a type-I complex in which a cellulosomal enzyme Doc (AcDocCel5) is bound to the sixth Coh of ScaA (AcCohScaA6). Biochemical analysis guided by the crystal structure demonstrated that the enzyme-borne Doc modules of A. cellulolyticus interact with the Cohs of ScaA through a dualbinding mode. Residues that determine the different specificities between the type-I Coh-Doc complexes of A. cellulolyticus were identified. The data informed the use of rational design to explore whether the binding surface alone confers ligand specificity. The data show that whereas the nature of the residues in the ligand-binding surface plays a major role in Coh recognition, the topology of the Doc modules also influences specificity.

Results and discussion
Previous studies have shown that the type-I Doc of A. cellulolyticus ScaB binds specifically to the type-I Cohs of ScaC, but not to those of ScaA (10 -12). Similarly, enzyme-borne type-I Docs specifically bind to the seven type-I Cohs of ScaA (and to one in ScaD), but not to those of ScaC (21). In adherence to the canonical cellulosomal organizational framework, there are two distinct specificities within type-I Coh-Doc complexes of the A. cellulolyticus cellulosomal system, one responsible for recruiting enzymes to ScaA and a second one responsible for the anchoring of cellulosomes to the cell wall surface (Fig. 1). A recent study explored the structural and biochemical nature of one of these specificities by studying the interaction between the Doc of adaptor scaffoldin ScaB (AcDocScaB) and the third Coh of anchoring scaffoldin (AcCohScaC3) (20). To dissect the mechanisms of enzyme-Doc ScaA Coh recognition and whether the Doc scaffold contributes to ligand specificity, the structure of the Coh-Doc complex that recruits cellulosomal enzymes to ScaA was investigated by solving the X-ray crystal structure of the sixth ScaA Coh (AcCohScaA6) in complex with the Doc of a family 5 glycoside hydrolase (AcDocCel5). Established co-expression strategies for the production and purification of Coh-Doc complexes (5) had previously generated sufficient amounts of highly pure protein complexes that gave good quality crystals. This strategy, detailed below, was used to seek crystals of the ScaA-enzyme Coh-Doc complexes, to investigate the interface residues that govern the specificity and mode of interaction.

Expression and crystallization of A. cellulolyticus Coh-Doc complexes
Analysis of the AcDocCel5 sequence revealed a high degree of internal symmetry, which suggested that this Doc contained two identical Coh-binding interfaces. Because a dual-binding mode implies that two different complex conformations will be present in solution, this would probably compromise protein crystallization. It is well established that in type-I Docs, residues at positions 11 and 12 of each one of the two duplicated segments play a key role in Coh recognition and act as specificity determinants (3). Thus, to force a single binding mode and therefore promote homogeneity in the final product, two Doc mutants were created. AcDocCel5 mutations used for the crys-

Dual-specificity Coh-Doc complex
tallization experiments were designed to replace the putative recognition residues in relative positions 11 and 12 (Ser-15/ Ile-16 and Ser-51/Leu-52) with those of the ScaB Doc (Ile-Asn), rather than the commonly applied alanine substitution. These amino acid changes were performed based on previous data that revealed a lack of cross-reaction between these two type-I Coh-Doc complexes (21). The sequence of the resulting Docs is displayed in Table 1. Recombinant plasmids encoding Doctagged AcCohScaA6 -DocCel5 complexes were selected to produce highly pure Coh-Doc complexes for crystallization. Both AcCohScaA6 -DocCel5 variants, containing the S15I/ I16N (M1) or S51I/L52N (M2) amino acid substitutions, resulted in high-quality crystals.

Structure of a novel A. cellulolyticus Coh-Doc complex
AcCohScaA6 -DocCel5 M1 and AcCohScaA6 -DocCel5 M2 structures were solved by molecular replacement, as described under "Experimental procedures" (Fig. 2). Data collection and refinement statistics are given in Table 2.

Structure of AcCohScaA6 in complex with AcDocCel5
AcCohScaA6 type-I Coh in complex with its cognate Doc presents an elliptical structure comprising two ␤-sheets aligned in an elongated ␤-sandwich in a classic jellyroll fold. The two sheets are composed of ␤-strands 9, 1, 2, 7, and 4 on one face and ␤-strands 8, 3, 6, and 5 on the other face. Strands 1 and 9 align parallel to each other, thus completing the jellyroll topology, whereas the other ␤-strands are antiparallel (Fig. 2). ␤-Strand 8 is interrupted by a small ␤-hairpin, which spans residues Gly-118 to Pro-120, and there is a small ␣-helix N-terminal to ␤-strand 5. The two closest functionally relevant structural homologues to AcCohScaA6 were type-I Coh modules from Clostridia (Table S1).

Structure of AcDocCel5 in complex with AcCohScaA6
In both complexes, the AcDocCel5 Doc displays an identical structure that comprises two ␣-helices arranged in an antiparallel orientation ranging from residue Ile-15 to Leu-25 (helix 1) and from Ser-51 to Leu-61 (helix 3), respectively. These two helices comprise portions of the two classic Doc repeating segments, each containing a bound calcium ion in loops located at opposite ends of the module. The loop connecting these secondary structures contains a six-residue ␣-helix extending from Asp-37 to Gly-41 (helix 2). The overall structure of A. cellulolyticus AcDocCel5 is very similar to other type-I Docs (Table S1) The Ca 2ϩ ion located at the Doc N terminus is coor- Figure 1. Architecture of A. cellulolyticus cellulosome. The scheme is color-coded to highlight the different Coh-Doc specificities. A, Doc-containing enzymes are incorporated into the ScaA scaffoldin through interaction with the seven ScaA Cohs (light green). ScaB plays the role of an adaptor protein that mediates between the ScaA Doc (yellow) and the Cohs of the anchoring scaffoldin (red) ScaC. The entire complex is attached to the cell surface via the SLH module of ScaC (orange). ScaA contains also a CBM (blue) and a GH9 (light brown) catalytic module. B, an additional mechanism of cellulosome attachment; ScaA is bound to the type-II Cohs of ScaD (yellow), which can also accept a single enzyme via its third type-I Coh (light green). The SLH module of ScaD serves to anchor the alternative complex to the cell surface.

Dual-specificity Coh-Doc complex
dinated by the side chains of residues Asp-6, Asp-8, Asn-10, and Asp-17 (both the O␦1 and O␦2), the latter belonging to the N-terminal ␣-helix (helix 1) of this module. The octahedral geometry of the coordination of this Ca 2ϩ ion is fulfilled by the main-chain carbonyl of Ser-12 and by a water molecule. The second Ca 2ϩ site stabilizes the loop connecting the internal and C-terminal ␣-helix (helices 2 and 3) of the Doc module. This Ca 2ϩ ion is coordinated by the side chains of residues Asp-42, Asp-44, Asn-46, and Asp-53 (both the O␦1 and O␦2) as well as by the carbonyl of Ser-48, with the octahedral geometry also completed by a water molecule. Thus, both Ca 2ϩ comprise n, n ϩ 2, n ϩ 4, n ϩ 6 (main-chain oxygen atom), n ϩ 11, and a water molecule completing the coordination pattern. There is a third calcium atom bound to AcDocCel5 which is coordinated by a loop between helix 1 and helix 2. This calcium is distant from the ligand-binding surface and thus probably plays a stabilizing role in protein structure. This third Ca 2ϩ , not previously observed in Coh-Doc complexes, presents the typical octahedral geometry coordination through the side chains of Asp-31 and Asp-37, the main chain carbonyl oxygen atoms of Phe-32 and Ala-34, and by two water molecules (Fig. S1).

A. cellulolyticus type-I CohScaA6-DocCel5 M1 and CohScaA6-DocCel5 M2 interfaces
In the two AcCohScaA6 -DocCel5 complexes, AcDocCel5 interacts with the 8-3-5-6 ␤-sheet of the AcCohScaA6 ␤sandwich via helices 1 and 3. The Doc-contacting surface of AcCohScaA6 presents a predominantly flat rectangular shape, whose angles are slightly elevated toward the Doc and corre- Table 1 Recombinant protein sequences of AcDocCel5, AcDocScaB, AcCohScaA6, AcCohScaC3, and mutants of both dockerins produced for the interaction studies The mutated residues are highlighted in black.

Figure 2. Structures of the A. cellulolyticus cohesin-dockerin complexes.
A, structure of AcCohScaA6 -DocCel5 M1 with the Doc color-ramped from N terminus (blue) to C terminus (red) and the Coh in gold. Ser-51 and Leu-52, which dominate Coh recognition, and engineered residues Ile-15 and Asn-16, to force a single binding mode, are labeled and shown in a stick configuration. Ca 2ϩ ions are depicted as purple spheres. B, structure of AcCohScaA6 -DocCel5 M2 with the Doc color-ramped from N terminus (blue) to C terminus (red) and the Coh in burgundy. Ser-15 and Ile-16 that dominate Coh recognition and engineered residues Ile-51 and Asn-52, to force a single binding mode, are again labeled and shown as stick representations. C, overlay of the two binding modes showing the high degree of overall similarity reflecting the internal 2-fold symmetry of the Doc module. The transparent gray disk marks the plane defined by the 8-3-6-5 ␤-sheet, where the ␤-strands form a distinctive Doc-interacting plateau. A also depicts a representation of the molecular surface contour of the Coh and Doc, respectively. Ca 2ϩ ions are depicted as green spheres.

Dual-specificity Coh-Doc complex
spond to the loops between ␤-strands 4 and 5, 5 and 6, and 8 and 9 and the ␤-hairpin that interrupts ␤-strand 8. In the AcCohScaA6 -DocCel5 M1 structure, helix 3 dominates the Doc's interaction with the Coh. Contacts are established by the entire length of helix 3, whereas only the C-terminal portion of helix 1 interacts with the Coh. In contrast, in the AcCohScaA6-DocCel5 M2, the exact opposite happens; Coh contacts are established by the entire length of helix 1 and the C-terminal portion of helix 3 in a helix 1-dominated interaction. The structures of AcCohScaA6 -DocCel5 M1 and M2 were found to be very similar to each other, with a backbone root mean square deviation of 0.5 Å (Fig. 2). Furthermore, helix 1 and helix 3 of AcDocCel5 M1 overlapped almost perfectly with helix 3 and helix 1 of AcDocCel5 M2, respectively, as a result of a 180°r otation in relation to the Coh, imposed by the symmetrically related opposite mutations (Fig. 2). In contrast, helix 2 that bridges the duplicated segments has two distinct spatial positions when both structures are overlaid. This suggests that the Doc internal structural symmetry supports the Coh recognition through two highly similar binding interfaces. This dualbinding mode, resulting from a nearly perfect 2-fold internal structural symmetry, is typical of type-I Coh-Doc complexes (5,17,20).
A large network of hydrophobic interactions plays a key role in AcCohScaA6 -DocCel5 M1 and M2 complex assembly (Table S2 and Fig. 3 (C and D)). The intermolecular interfaces also include several hydrogen bonds (Table 3 and Fig. 3 (A and  B)). The DocCel5 residues at the complex interface located in helices 1 and 3 remain largely unchanged upon the 180°rotation of the Doc module over the CohScaA6 surface, reflecting the internal symmetry of the ScaB Doc (Figs. 3 and 4). Therefore, the interactions between the dominant Doc helix and the

Thermodynamics of the dual-binding mode
Previous studies revealed that type-I Coh-Doc complexes of other cellulosome systems that display a dual-binding mode, such as those of C. thermocellum or C. cellulolyticum, have no preference for a particular binding orientation (17,19). Thus, affinity between Cohs and Docs is similar whether the Doc  D). In all panels, the most important residues involved in Coh-Doc recognition are depicted in stick configuration, with a dark background label for the Doc residues and a light background label for the Coh residues, using the AcDocCel5 and AcCohScaA6 numbering. Solid black lines, hydrogen-bond interactions. The Docs are shown color-ramped from N terminus (blue) to C terminus (red). Ca 2ϩ ions are depicted as green spheres. In all panels, the transparent gray disk marks the plane defined by the 8-3-6-5 ␤-sheet, where the ␤-strands form a distinctive Doc-interacting plateau. Table 3 Main polar contacts between AcCohScaB6 and both AcDocCel5 mutants Table was made using the PDBePISA server. Dockerin residues are marked as belonging either to helix 1 (H1) or to helix 3 (H3) interfaces.

Dual-specificity Coh-Doc complex
module binds its protein partner via the N-terminal or the C-terminal helix. To establish whether a similar mechanism operates during AcCohScaA6 -DocCel5 recognition, the binding thermodynamics between AcCohScaA6 and the wildtype, M1, M2, and M1 ϩ M2 variants of AcDocCel5 were determined using isothermal titration calorimetry (ITC). The data, presented in Table 4 and exemplified in Fig. S2, revealed a macromolecular association with a 1:1 stoichiometry and a K a of ϳ10 8 M Ϫ1 , an affinity similar to other type-I Coh-Doc interactions (19). This stoichiometry indicates that one Coh can only bind one Doc. As expected, the AcDocCel5 M1 ϩ M2 mutant, in which both N-terminal and C-terminal residues at positions 11 and 12 were substituted, did not bind AcCohScaA6. Interestingly, both M1 and M2 mutations resulted in a decreased affinity for AcCohScaA6. Whereas the M1 mutations caused a very modest change in affinity, the M2 amino acid substitutions resulted in a 160-fold reduction in K a . Although the binding interface of both M1 and M2 mutants is virtually identical, the subtle differences in the regions observed in the two protein complexes may result in relatively weaker contribution by van der Waals contacts when helix 1, as opposed to helix 3, dominates the interaction (87 non-bonded contacts in total versus 99 for the AcDocCel5 M1 mutant). Alternatively, the fact that AcDocCel5 is fused to an unrelated protein module to provide stability in Escherichia coli may lead to steric effects (not observed in the crystal structure, as they only comprise Doc-Coh heterodimers) when M2 binds to the Coh. It is rather unlikely that there is a preferential binding orientation for the AcCohScaA6 -DocCel5 interaction, favoring the conformation in which the N-terminal ␣-helix of the Doc dominates Coh recognition.

Developing a specificity hybrid A. cellulolyticus type-I Doc
Overall, the structure and mode of interaction of the AcCohScaA6 -DocCel5 complex are very similar to those of the previously characterized AcCohScaC3-DocScaB heterodimer that displays a different specificity (20). Both AcDocCel5 and AcDocScaB possess the ability to bind their Coh partners in two different orientations, resulting in Coh-Doc complex configurations that are superimposable (root mean square deviation of 1.1 and 1.0 Å for the helix 1-dominated interaction and the helix 3-dominated interactions, respectively) (Fig. 4). Crucial interacting residues are generally located at the same relative positions, and, especially at the N-terminal Doc repeats, there is a high degree of sequence conservation between AcDocCel5 and AcDocScaB, exemplified by the key interacting residues Arg-22, Leu-26, and Arg-58. Despite these similarities, AcDoc-ScaB displays a distinct Coh specificity when compared with AcDocCel5, whose binding properties should represent those of the remaining A. cellulolyticus type-I Doc modules that recruit enzymes to the cellulosome. Overall, these observations suggest that the two different type I Coh-Doc specificities identified in A. cellulolyticus are modulated by the nature of the contacting residues located at the protein interfaces, although differences in the topography of the two protein partners may also contribute to specificity.
To probe the hypothesis that differences in Coh-Doc specificity in A. cellulolyticus are modulated by the nature of the residues identified at the surface of the two protein modules, we attempted to alter the specificity of type-I Docs based on the structure of the two available protein-protein complexes. The aim was to create an AcDocScaB mutant capable of recognizing ScaA Cohs, an AcDocCel5 variant capable of binding to ScaC Cohs, and a hybrid type-I Doc that exhibited both specificities via two distinct Coh-binding interfaces. Structural alignment between AcDocCel5 and AcDocScaB (Fig. 5)

Dual-specificity Coh-Doc complex
generating the AcDocScaB RINAVID mutant. For the AcDoc-ScaB mutants, instead of directly replacing the key residues with those of AcDocCel5, a consensus sequence based on 137 A. cellulolyticus Doc sequences (that also bind to AcCohScaA6) was used (Fig. S3) (13). Residues Arg-14, Ile-15, Asn-16, Ala-18, Leu-20, and Asp-23 were thus replaced in AcDocScaB by Asn-14, Ser-15, Ile-16, Phe-18, Tyr-20, and Gln-23 of the type-I Doc consensus. This generated AcDocScaB M7, whose sequence is displayed in Table 1. By duplicating these substitutions in the AcDocScaB second repeat, by introducing the mutations K50N, I51S, N52L, A54F, L56Y, and D59Q to M7, mutant M8 was generated (Table 1). It was predicted that AcDocCel5 RINAVID and AcDocScaB M7 would be able to recognize both AcCohScaA6 and AcCohScaC3 and that AcDocScaB M8 would completely switch specificity and only be able to bind AcCohScaA6. The ability of these Doc derivatives to bind the two different Coh counterparts was initially probed by non-denaturing gel electrophoresis (NGE) (data not shown). Data suggested that AcDocCel5 RINAVID could still recognize AcCohScaA6, whereas it did not acquire the ability to bind AcCohScaC3. AcDocScaB M7, however, could indeed recognize both Cohs. Interestingly, based on the NGE analysis, the double mutant, AcDocScaB M8, did not seem to be able to bind any of the Cohs (data not shown). To confirm these results and further explore the thermodynamics of these interactions, ITC was carried out at 308 K. The data, presented in Table 4 and exemplified in Fig.  S2, confirm the results suggested by the NGE analysis. The AcDocScaB RINAVID mutant still bound to AcCohScaA6 with a K a value of 2 ϫ 10 6 M Ϫ1 , whereas it failed to bind AcCohScaC3. On the other hand, AcDocScaB M7 bound to AcCohScaA6 and AcCohScaC3 with K a values of 4.2 ϫ 10 6 and 3.6 ϫ 10 6 M Ϫ1 , respectively. These data demonstrate that it is possible to engineer a type-I Doc such that each binding site displays a different specificity. In contrast, AcDocScaB M8 did not show affinity for AcCohScaA6. Through inspection of both AcDocScaB and AcDocCel5 Doc structures, it is apparent that the gap between the N-and C-terminal helix backbones is narrower in AcDocScaB (ϳ4.7 Å at its narrowest point) compared with AcDocCel5 (6.5 Å in AcDocCel5) (Fig. 6). The close proximity between helices 1 and 3 in AcDocScaB might not allow

Dual-specificity Coh-Doc complex
enough space to accommodate the mutations on both Doc repeats without steric clashes, especially between Phe-18 and Phe-54 (Fig. 6). Therefore, AcDocScaB M8 specificity is probably compromised through steric hindrance, which explains why this Doc derivative was unable to bind to AcCohScaA6. The inability of AcDocCel5 RINAVID to bind to the ScaB Coh may also reflect steric constraints imposed by the topography of the enzyme Doc. Thus, whereas the data show that the nature of the residues at the interface of Coh-Doc complexes is an important modulator of specificity, the topology of the Docs also contributes to ligand specificity through steric effects.

Conclusions
It is now well-established that type-I Coh-Doc interactions are essential to recruit cellulosomal enzymes onto primary scaffoldins, which in turn are attached to the cell surface via a type-II Coh-Doc pair. In A. cellulolyticus, a second type-I Coh-Doc specificity is responsible for the attachment of an unusual adaptor scaffoldin ScaB to the bacterial cell surface. Previous work revealed that the type-I Coh-Doc complexes that recruit ScaB to the cell envelope present a dual-binding mode resulting from the presence of two identical Coh-binding faces, characteristic of the majority of cellulosomal type-I Coh-Doc complexes. Here, we reveal that the type-I Coh-Doc complexes that recruit enzymes to the cellulosome of A. cellulolyticus also present a dual-binding mode, suggesting that flexibility in the orientation of Coh recognition seems to be a general feature of type-I Doc modules, including those that recruit cellulosomes onto the cell surface. The structure of AcDocCel5 revealed an internal symmetry that supports the presence of two virtually identical Coh-binding faces. Due to the high degree of homology shared by the two different type-I Coh-Doc specificities discovered in A. cellulolyticus, an engineered Doc was designed with each ligand-binding site recognizing a different Coh. The data showed that although the residues in the two ligand-binding sites made a major contribution to Coh recognition, it should also be recognized that the topology of the Doc modules, through steric effects, also influenced Coh recognition. Thus, the evolution of Coh recognition by Doc modules requires modulation of both the ligand-binding surface and topology of the complete protein module.

Gene synthesis and DNA cloning
Docs are inherently unstable when produced in E. coli. To promote Doc stability, A. cellulolyticus DocCel5 of protein WP_010249057 (residues 502-573) was co-expressed in vivo with the sixth Coh of ScaA, AcCohScaA6 (AAF06064; residues 1472-1611). The immediate binding of AcDocCel5 to AcCohScaA6 is believed to confer the necessary Doc stabilization. The genes encoding the two proteins were designed with a codon usage optimized to maximize expression in E. coli, synthesized in vitro (NZYTech Ltd., Lisbon, Portugal) and cloned into pET28a (Merck Millipore, Darmstadt, Germany) under the control of separate T7 promoters. The AcDocCel5-encoding gene was positioned at the 5Ј-end and the AcCohScaA6encoding gene at the 3Ј-end of the artificial DNA. A T7 terminator sequence (to terminate transcription of the Doc gene) and a T7 promoter sequence (to control transcription of the Coh gene) were incorporated between the sequences of the two genes. This construct contained specifically tailored NheI and NcoI recognition sites at the 5Ј-end and XhoI and SalI at the 3Ј-end to allow subcloning of the nucleic acid into pET-28a (Merck Millipore), such that the sequence encoding a six-residue His tag could be introduced either at the N terminus of the Doc (through digestion with NheI and SalI, incorporating the additional sequence MGSSHHHHHHSSGLVPRGSHMAS at the N terminus of the AcDocCel5) or at the C terminus of the AcCohScaA6 (by cutting with NcoI and XhoI, which incorporates the additional sequence LEHHHHHH at the C terminus of the Coh). To block the dual-binding mode and promote the structural homogeneity required for protein crystallization, two different genes were synthesized, each with a distinct Doc mutant: mutant M1 with the S15I and I16N amino acid changes and mutant M2 with the S51I and L52N replacements. These substitutions represent residue changes to amino acids present in type-I Docs of A. cellulolyticus that do not bind to ScaA but rather to the cell-surface anchoring scaffoldin ScaC. In addition, these residues are located, respectively, at the N-terminal

Dual-specificity Coh-Doc complex
and C-terminal Coh recognition sites. Thus, as a result of this strategy, four pET28a plasmid derivatives were produced: pET28DtC M1 and M2 with the engineered tag in the Doc and pET28DCt M1 and M2 where the engineered tag is attached to the Coh. The four plasmids were used to express AcCohScaA6 -DocCel5 M1 and M2 complexes in E. coli. Recombinant AcDocCel5 and AcCohScaA6 primary sequences are presented in Table 1.
To produce recombinant AcCohScaA6 and AcDocCel5 individually, an ELISA-based system designed to probe Coh-Doc affinities that requires fusion with xylanase or carbohydratebinding modules (CBMs) was selected, as it allows production of highly stable and functional Coh and Doc derivatives (23). Thus, sequences encoding each of the two modules were amplified from A. cellulolyticus genomic DNA by PCR, using NZY-Proof polymerase (NZYTech Ltd.) and the primers shown in Table S3. The M1 and M2 Doc mutants were amplified from the previously described synthetized DNA constructs. Following gel purification, the AcDocCel5 encoding amplicon was inserted into a xylanase-Doc cassette in pET9d plasmid after digestion with KpnI and BamHI and ligation with T4-ligase. The resulting expressed products consist of His-tagged AcDocCel5 fused to the xylanase T-6 from Geobacillus stearothermophilus at the N terminus of the polyhistidine tag (Xyn AcDocCel5). The AcCohScaA6encoding gene was cloned into CBM-Coh cassettes in pET28a after digestion with BamHI and XhoI restriction enzymes. This resulted in His-tagged AcCohScaA6 recombinant derivative fused to a CBM3a from the C. thermocellum scaffoldin CipA (CBM AcCohScaA6) (24). Xyn AcDocScaB and CBM AcCohScaC3 were produced for a previous study, following the same approach (20).
For the specificity switch experiments, several XynAcDocCel5 protein derivatives were produced using site-directed mutagenesis (Table S3). Each of the newly generated gene sequences was fully sequenced to verify that only the desired mutation accumulated in the nucleic acid chain. The AcDoc-ScaB mutants (M7 and M8) were produced for a previous study using previously published primers (20).

Expression and purification of recombinant proteins
Preliminary expression screens revealed that when the polyhistidine tag was located at the Doc N-terminal end of the AcCohScaA6 -DocCel5 complex, the expression levels of both Coh and Doc were higher. Tagging the Coh resulted in the accumulation of high levels of unbound Coh in the purification product, suggesting that the Coh was expressed at higher levels than the Docs. Consequently, the plasmid pET28DtC was used to transform E. coli BL21 (DE3) cells to produce the AcCohScaA6 -DocCel5 complex in large quantities. Transformed E. coli were grown at 37°C to an A 600 of 0.5. Recombinant protein expression was induced by the addition of 1 mM isopropyl ␤-D-1-thiogalactopyranoside followed by incubation at 19°C for 16 h. Cells were harvested by 15-min centrifugation at 5000 ϫ g and resuspended in 20 ml of IMAC binding buffer (50 mM HEPES, pH 7.5, 10 mM imidazole, 1 M NaCl, 5 mM CaCl 2 ). Cells were then disrupted by sonication, and the cellfree supernatant was recovered by a 30-min centrifugation at 15,000 ϫ g. After loading the soluble fraction into a HisTrap TM nickel-charged Sepharose column (GE Healthcare), initial puri-fication was carried out by IMAC in an FPLC system (GE Healthcare) using conventional protocols with a 35 mM imidazole wash and a 35-300 mM imidazole gradient. The buffer of all recovered fractions containing the purified Coh-Doc complex was exchanged into 50 mM HEPES, pH 7.5, containing 200 mM NaCl, 5 mM CaCl 2 using a PD-10 Sephadex G-25 M gelfiltration column (Amersham Pharmacia Biosciences). A further purification step by gel-filtration chromatography was performed by loading the samples onto a HiLoad 16/60 Superdex 75 column (GE Healthcare) at a flow rate of 1 ml min Ϫ1 . Fractions containing the purified complex were then concentrated with Amicon Ultra-15 centrifugal devices with a 10-kDa cutoff membrane (Millipore) and washed three times with molecular biology grade water (Sigma) containing 0.5 mM CaCl 2 . The protein concentration was estimated in a Nano-Drop 2000c spectrophotometer (Thermo Scientific) using a molar extinction coefficient (⑀) of 8,940 M Ϫ1 cm Ϫ1 . The final protein concentration was adjusted to 12 mg/ml for XynAcDocCel5 M2 and 15 mg/ml for XynAcDocCel5 M1, in molecular biology grade water containing 0.5 mM CaCl 2 . The purity and molecular mass of the recombinant complex was confirmed by 14% (w/v) SDS-PAGE.
CBMCohs, XynDocs, and respective protein derivatives used in ITC and native PAGE experiments were expressed as described above and purified with IMAC by nickel-charged Sepharose His GraviTrap gravity-flow columns (GE Healthcare). After IMAC, the recombinant Coh and Docs were bufferexchanged to 50 mM HEPES, pH 7.5, 0.5 mM CaCl 2 , and 0.5 mM tris(2-carboxyethyl)phosphine using PD-10 Sephadex G-25 M gel filtration columns (GE Healthcare).

NGE
For the NGE experiments, each of the XynAcDocCel5 and XynAcDocScaB variants, at a concentration of 30 M, was incubated in the presence and absence of 30 M CBMAcCohScaA6 or CBMAcCohScaC3 for 30 min at room temperature and separated on a 10% native polyacrylamide gel. Electrophoresis was carried out at room temperature. The gels were stained with Coomassie Blue. Complex formation was detected by the presence of an additional band displaying a lower electrophoretic mobility than the individual modules.

Isothermal titration calorimetry
All ITC experiments were carried out at 308 K. The purified XynAcDocCel5, XynAcDocScaB, CBMAcCohScaA6, or CBMAc CohScaC3 variants were diluted to the required concentrations and filtered using a 0.45-m syringe filter (PALL). During titrations, the Doc constructs were stirred at 307 rpm in the reaction cell and titrated with 28 successive 10-l injections of Coh at 220-s intervals. Integrated heat effects, after correction for heats of dilution, were analyzed by nonlinear regression using a single-site model (Microcal ORIGIN version 7.0, Microcal Software). The fitted data yielded the association constant (K a ) and the enthalpy of binding (⌬H). Other thermodynamic parameters were calculated using the standard thermodynamic equation, ⌬RTlnK a ϭ ⌬G ϭ ⌬H Ϫ T⌬S.

Dual-specificity Coh-Doc complex X-ray crystallography, structure determination, and refinement
The crystallization conditions were set up using the sittingdrop vapor-diffusion method with an Oryx8 robotic nanodropdispensing system (Douglas Instruments (25). The commercial kits Crystal Screen, Crystal Screen 2, PEG/Ion, and PEG/Ion 2 (Hampton Research), JCSGϩ HT96 (Molecular Dimensions), and an in-house screen (80 factorial) were used for the screening. Precisely 0.7-l drops of 15 and 12 mg/ml mg ml Ϫ1 of AcCohScaA6 -DocCel5 M1 and M2, respectively, were mixed with 0.7 l of reservoir solution at room temperature per well containing 50 l of the crystallization solution. The resulting plates were then stored at 292 K. Crystal formation was observed in four different conditions for AcCohScaA6 -DocCel5 M1 and in one condition for AcCohScaA6 -DocCel5 M2, within ϳ15 days (maximum dimension ϳ120 ϫ 100 ϫ 30 m). All of the crystals were obtained from the initial screens. These crystals were cryoprotected with mother solution containing 20 -30% glycerol or with 100% Paratone-N (Hampton Research) and flash-cooled in liquid nitrogen. Data were collected on beamline ID29 at the European Synchrotron Radiation Facility (Grenoble, France), using a PILATUS 6M detector (Dectris Ltd.) from crystals cooled to 100 K using a Cryostream (Oxford Cryosystems Ltd). iMOSFLM (26) was used for strategy calculation during data collection. All data sets were processed using iMOSFLM (26) and AIMLESS (27) from the CCP4 suite (Collaborative Computational Project, Number 4 (28)). Data collection statistics are given in Table 2.
The best-diffracting AcCohScaA6 -DocCel5 M1 crystals were the ones formed in the condition composed of 0.2 M sodium thiocyanate, 20% (w/v) PEG 3350, pH 6.9, and diffracted to a resolution of 1.57 Å. The crystals from the other three conditions did not diffract at all. The crystal belongs to the orthorhombic space group P2 1 2 1 2 1 . The best-diffracting AcCohScaA6 -DocCel5 M2 crystals were those formed under the condition composed by 0.2 M CaCl 2 , 0.1 M HEPES, pH 7.5, and 28% PEG 400. The crystal belongs to the monoclinic space group P2 1 . BALBES was used to carry out molecular replacement (29). The best solution for AcCohScaA6 -DocCel5 M1 was found using the type-I Coh-Doc complex from C. cellulolyticum (PDB entries 2VN5 and 2VN6 with sequence identity of 36.9% with the Coh and 32.8% with the Doc (19)), producing at the end of the BALBES run an R factor and R free of 35.7 and 40.6%, respectively, and a Q-factor of 0.719 after REFMAC5 refinement (30). An ARP/wARP (31) run after BALBES gave a model of 400 residues in six chains, with an estimated correctness of 99.9%. Two copies of the heterodimer AcCohScaA6 -DocCel5 M1 complex are present in the asymmetric unit. This model was adjusted and refined using REFMAC5 and PDB REDO (32) interspersed with model adjustment in COOT to give the final structure (Protein Data Bank code 5NRK; Table 2). The final round of refinement was performed using the TLS/restrained refinement procedure, using each module as a single group. The root mean square deviation of bond lengths, bond angles, torsion angles, and other indicators were continuously monitored using validation tools in COOT and MOLPROBITY (33). A summary of the refinement statistics is shown in Table 2. The best solution for AcCohScaA6 -DocCel5 M2 was found using the AcCohScaA6 -DocCel5 M1 refined model. The refinement process was as described above for AcCohScaA6 -DocCel5 M1 (Protein Data Bank code 5NRM; Table 2).